Free monthly limit reached

I have been using a few models for months now, and I have never had this problem. i have google colab pro and I still have a lot of inference providers left.
Im receiving this message (no matter which provider I choose) and I dont really know what it means since supposedly there isnt a limit.

I tried creating a new token and still have the same problem,

raise _format(HfHubHTTPError, str(e), response) from e
huggingface_hub.errors.HfHubHTTPError: 402 Client Error: Payment Required for url: /static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fgroq%2Fopenai%2Fv1%2Fchat%2Fcompletions%3C%2Fa%3E (Request ID: Root=)

You have reached the free monthly usage limit for groq. Add pre-paid credits to your account to continue using groq.

1 Like

Is the provider fixed to Groq because the endpoint URL contains Groq?


You are not hitting a “model limit” in Colab. You are hitting a billing and credits limit in the Hugging Face Router path that calls Groq.

Your error shows the key detail:

  • URL: /static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fgroq%2Fopenai%2Fv1%2Fchat%2Fcompletions%3C%2Fcode%3E%3C%2Fli%3E
  • HTTP: 402 Payment Required
  • Message: “free monthly usage limit for groq … add pre-paid credits …”

That means the request is being routed through Hugging Face’s Inference Providers router and the provider being used is Groq (because the path literally includes /groq/). (Hugging Face)


Background: what “Inference Providers” really is

1) Colab Pro vs Inference Providers

  • Google Colab Pro buys you more or better Google compute (GPU/TPU time, RAM, runtime priority).
  • It does not buy you unlimited usage on external hosted inference APIs like Hugging Face Router or Groq.

So “I have Colab Pro” does not affect this 402.

2) Hugging Face Inference Providers uses “monthly credits”

Hugging Face runs a central “router” that can send your request to many providers (Groq, Together, Hyperbolic, Fireworks, etc.). (Hugging Face)

But usage is not unlimited. Hugging Face gives monthly dollar credits:

  • Free: $0.10/month (no pay-as-you-go)
  • PRO: $2.00/month (pay-as-you-go allowed)
  • Team/Enterprise: $2.00 per seat/month (pay-as-you-go allowed) (Hugging Face)

This is the most common reason people suddenly see 402 Payment Required after “it worked for months.” It simply means the free monthly credits are now exhausted. (Hugging Face)

3) Two billing modes exist and they matter

Hugging Face explicitly supports two ways to pay:

  1. Routed by Hugging Face

    • HF applies your monthly credits.
    • HF bills you (pay-as-you-go only if PRO/Enterprise). (Hugging Face)
  2. Custom Provider Key

    • You add your own Groq (or other provider) API key in HF settings.
    • HF credits do not apply. The provider bills you directly. (Hugging Face)

So a “free monthly limit reached” can be:

  • HF free credits exhausted (HF-routed), or
  • provider-side credits/quota exhausted (custom provider key), depending on how you configured it.

Why you see it “no matter which provider I choose”

Because your request is pinned to Groq by the URL:

router.huggingface.co/groq/openai/v1/...

If your code sets base_url to a Groq-specific router path, you will hit Groq every time, even if you think you selected another provider elsewhere. This is a very common configuration mistake. (Hugging Face Forums)

Correct “provider-switchable” usage uses the unified endpoint:

/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fv1%3C%2Fcode%3E (Hugging Face)


Why creating a new token did not help

A Hugging Face access token is just authentication. It does not refill monthly credits. Credits are account-level and reset monthly per Hugging Face’s billing rules. (Hugging Face)

Also, there is a second class of failures where token permissions or missing billing setup triggers similar issues. Hugging Face staff explicitly mention that 402 can happen if there is no payment method on the account or the token lacks the right permissions. (Hugging Face Forums)


What to do next (simple decision tree)

Step 1: Confirm which “limit” you hit

  1. Check your Inference Providers Settings usage breakdown (past month by model and provider). Hugging Face documents that this view exists in settings. (Hugging Face)
  2. Check whether you set a custom Groq provider key in HF settings (Custom Provider Key mode). HF explains custom keys and that HF can “swap” auth when routing. (Hugging Face)

If HF-routed and you are on Free, the answer is usually: you used up the $0.10 monthly credits. (Hugging Face)

Step 2: Fix the “provider selection” problem in your code

If you want to switch providers, do not call /groq/... in the base URL.

Use:

from openai import OpenAI
import os

client = OpenAI(
    base_url="/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fv1",
    api_key=os.environ["HF_TOKEN"],
)

This is exactly how Hugging Face’s “Getting Started” page shows the OpenAI-compatible setup. (Hugging Face)

Then select provider via the library’s provider parameter (if using InferenceClient) or via whatever provider selection mechanism your wrapper uses. Example from HF docs for InferenceClient explicitly sets provider="together". (Hugging Face)

Step 3: Choose how you want to pay

You have three practical options:

  1. Stay HF-routed, upgrade to PRO

    • You get $2 monthly credits.
    • You get pay-as-you-go after credits are exhausted. (Hugging Face)
  2. Stay HF-routed, but bill an Organization (if you have one)

    • HF supports org billing via X-HF-Bill-To or bill_to. (Hugging Face)
  3. Bypass HF Router and call Groq directly

    • Use Groq’s OpenAI-compatible API base URL: https://api.groq.com/openai/v1. (GroqCloud)
    • This avoids HF monthly credits entirely, and uses Groq’s own account limits and billing.

Groq’s official docs show the exact OpenAI client configuration for that base URL. (GroqCloud)


“Similar cases” online (what others report)

These are very close matches to your situation:

  • HF forum: users getting HfHubHTTPError: 402 and “exceeded monthly included credits,” including examples on router.huggingface.co/together/.... (Hugging Face Forums)
  • HF forum: same 402 but routed to another provider (hyperbolic) through the HF router, showing it is not just Groq. (Hugging Face Forums)
  • HF forum replies: staff/community note 402 can be tied to payment method missing or token permission settings. (Hugging Face Forums)

Reference links (copy/paste)

Hugging Face Inference Providers pricing and monthly credits:
https://huggingface.co/docs/inference-providers/en/pricing

Hugging Face “Getting Started” showing base_url=/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fv1%3A
https://huggingface.co/inference/get-started

Hugging Face forum thread with 402 “monthly included credits” examples:
/static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2Ft%2Fhugging-face-payment-error-402-youve-exceeded-monthly-quota%2F144968

Groq OpenAI compatibility and base_url=https://api.groq.com/openai/v1:
https://console.groq.com/docs/openai

Quick summary

  • Your URL contains /groq/, so you are using Groq even if you think you switched providers. (Hugging Face)
  • Hugging Face free tier has a monthly credit cap ($0.10) and Free has no pay-as-you-go. (Hugging Face)
  • New tokens do not reset credits. Credits are account-level. (Hugging Face)
  • Fix: use /static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fv1%3C%2Fcode%3E if you want switching, or call Groq directly at https://api.groq.com/openai/v1. (Hugging Face)

Thank you! The biggest doubt I have is, I have PRO: $2.00/month (pay-as-you-go allowed). And the way im using the models is: client = InferenceClient(“meta-llama/Llama-3.3-70B-Instruct”, token = hf_token), with a few different models (Llama, deepseek, grok) never specifying the provider. I guess it automatically chooses one and I had never had this message before.

Does this mean this models arent counted in the hugging face “pay-as-you-go”? Like, do I need to also pay for the actual provider when the model doesnt have the (HF Inference API) provider?
And if so, is there a way to know when this monthly limit (in the provider) restarts?

Thank you and sorry for the questions! I guess I didnt really realized how it worked since I have been using hugging face, with several models, for months and I had never reached this limit.

1 Like

I have PRO

Hmm… Hugging Face Pro? Colab Pro? If Colab Pro, it will not affect the quota.

A possible scenario if you had set your Groq key in Hugging Face:


In your case, the simplest consistent explanation is: your calls are being routed to Groq (implicitly), and you are being blocked by a Groq-side “free monthly usage” gate, not by Hugging Face PRO pay-as-you-go.

That sounds contradictory until you separate three things: routing, billing mode, and provider selection.


1) What the URL proves

Your stack trace includes:

/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fgroq%2Fopenai%2Fv1%2Fchat%2Fcompletions%3C%2Fcode%3E%3C%2Fp%3E

That is the Hugging Face Router, but it is explicitly the Groq route because the path contains /groq/.

So regardless of what you “meant” to do, this particular request ended up as “Groq provider” at the router layer.


2) Why you got Groq even though you never specified a provider

The default is “auto”

With InferenceClient(model_id, token=...) the provider defaults to provider="auto".

Hugging Face documents that auto routing picks the first available provider for that model, following your preference order in Inference Provider settings. (Hugging Face)

So your code:

client = InferenceClient("meta-llama/Llama-3.3-70B-Instruct", token=hf_token)

means:

  • “Pick whichever provider is first in my HF provider preference list that can serve this model.”
  • If that provider becomes unavailable or can’t serve the model, auto can effectively shift to another provider (failover behavior is part of why “auto” exists). (Hugging Face)

Why “it worked for months and then broke” fits auto routing

Auto routing can change “under you” if any of these change:

  • You changed provider preference order in HF settings.
  • You added a provider API key in settings.
  • A provider’s availability for a model changed, so the “first available” provider changed.

None of that requires code changes.


3) The big point: PRO pay-as-you-go only applies in one billing mode

Hugging Face has two billing modes for Inference Providers:

Mode A: Routed by Hugging Face (HF billing)

  • HF routes to the provider.
  • Billing is on your HF account.
  • Your monthly credits apply.
  • PRO supports extra usage pay-as-you-go. (Hugging Face)

Mode B: Custom Provider Key (provider billing)

  • You put your provider API key (Groq key, Together key, etc.) into HF settings.
  • HF still routes, but billing is directly on the provider.
  • HF monthly credits do not apply.
  • You need a provider account. (Hugging Face)

Hugging Face’s pricing page explicitly contrasts these two modes. (Hugging Face)


4) So are those models “counted in HF pay-as-you-go”?

If you are in “Routed by Hugging Face”

Then yes: it does not matter that the backend is Groq or Together or some other provider. It is still billed centrally through HF (credits then pay-as-you-go on PRO). (Hugging Face)

If you are in “Custom Provider Key”

Then no: those calls are not counted against HF credits or HF pay-as-you-go at all. They are counted against the provider’s plan and limits. (Hugging Face)

Your specific error message (“free monthly usage limit for groq… add pre-paid credits”) strongly resembles provider-billing behavior. In HF-billed mode, you normally see HF messages about monthly included credits, not “add prepaid credits to your Groq account.” (Many public HF threads show HF’s “exceeded monthly included credits” wording instead.) (Hugging Face Forums)

So my “most likely” read is: you have Groq configured as a custom provider key or otherwise are being treated as Groq-billed traffic.


5) The other common PRO gotcha: pay-as-you-go not actually active

Even if you intend HF billing, 402 can still happen if HF cannot charge you.

Hugging Face staff explicitly say a 402 commonly occurs when:

  • there is no payment method on the account, or
  • the token lacks the right permissions. (Hugging Face Forums)

Separately, HF users have reported getting 402 right after exceeding the $2 included credits and asking whether they must “enable permission” for pay-as-you-go. (Hugging Face Forums)

So a second plausible failure path is:

  • you burned through the $2 included credits
  • pay-as-you-go isn’t actually chargeable due to billing setup
  • HF blocks with 402

This path usually pairs with HF’s own “included credits exceeded” messaging, but it is still worth checking because it is common. (Hugging Face Forums)


6) The token angle (because you created a new one)

For Inference Providers, HF explicitly requires a fine-grained token with the permission “Make calls to Inference Providers.” (Hugging Face)

If you created a new token but did not grant that permission, you can get blocked or get inconsistent behavior.

That said, your current error is a billing-style 402, not a 401/403. So token permissions are usually “secondary” here, but still part of a complete checklist. (Hugging Face Forums)


7) How I would debug your exact situation (fast, decisive)

Step 1: Prove whether you are HF-billed or Groq-billed

Use the HF pricing model as the rule:

  • If you are HF-routed, your monthly credits apply and extra usage goes to HF pay-as-you-go (on PRO). (Hugging Face)
  • If you are using a custom provider key, credits do not apply and the provider bills you. (Hugging Face)

Practically, you verify by checking:

  • HF Inference Providers usage breakdown and whether your calls are recorded as HF-billed usage (HF provides usage breakdown tooling). (Hugging Face)
  • Groq Console usage/limits if you suspect Groq-billed traffic (Groq rate/limit concepts are documented). (GroqCloud)

Step 2: Check whether you configured a Groq custom key in HF settings

If a Groq key is present, you are likely in provider-billed mode for Groq traffic, and HF pay-as-you-go is not the thing that saves you. (Hugging Face)

Step 3: Stop using implicit auto during debugging

Auto is convenient, but it hides the root cause.

HF documents two clean ways to make routing deterministic:

  1. Set provider explicitly at client initialization (default is auto). (Hugging Face)

  2. Append routing policy/provider to the model id:

    • :fastest or :cheapest
    • or :<provider> to force a provider (Hugging Face)

If you force a non-Groq provider and you still see /groq/ in the failing URL, then you are not actually exercising that path (meaning some other part of your stack is pinned to Groq via base URL).

Step 4: Make sure HF can actually charge for pay-as-you-go

If you truly want HF pay-as-you-go to kick in after $2, confirm:

  • You have a payment method set.
  • Billing can generate invoices. HF says invoices are issued on the 1st for accrued usage (compute billing context, but it confirms how HF billing operates). (Hugging Face)

8) Your question about “HF Inference API” vs providers

You mentioned “when the model doesn’t have the (HF Inference API) provider.”

That’s a common confusion:

  • “HF Inference API / hf-inference” is one backend.
  • Inference Providers is a routing layer that can send requests to multiple third-party backends.

A model not being available on HF’s own backend does not imply “not counted” or “not supported.” It usually just means it will be served by another provider. Whether you pay HF or pay the provider depends entirely on the billing mode described above. (Hugging Face)


9) When does the “monthly limit” restart?

It depends on which limit you are actually hitting.

If the limit is Groq spend limits

Groq documents that monthly spend limits reset on the 1st of each month. (GroqCloud)

If the limit is Groq rate limits

Groq rate limits are defined in time windows (per minute and per day). Groq documents RPM, RPD, TPM, TPD, and that you can view exact limits in your account. (GroqCloud)

If the limit is HF monthly credits

HF documents monthly credits (Free $0.10, PRO $2.00) and that credits apply when routing through HF. (Hugging Face)
HF does not state a precise reset timestamp on the pricing table itself, so the practical answer is: check the Inference Providers usage/billing UI for your account’s reset timing.


My “most likely” diagnosis for your case

Ranked by likelihood given your exact error text and URL:

  1. You are effectively in Groq-billed mode (Custom Provider Key), and Groq free monthly allowance is exhausted. HF PRO is irrelevant in that mode. (Hugging Face)
  2. Auto routing picked Groq (because Groq is first available by your preference order), and you hit a Groq-side cap. This can happen even without a code change. (Hugging Face)
  3. HF pay-as-you-go is not chargeable (payment method or permissions), so HF blocks with 402 after credits. This is common, but the wording in your message points more strongly to (1). (Hugging Face Forums)

Summary bullets

  • InferenceClient(...) without provider uses auto, which picks the first available provider by your HF preference order. (Hugging Face)
  • HF PRO pay-as-you-go only applies when requests are Routed by Hugging Face. Custom provider keys bypass HF billing. (Hugging Face)
  • Your error text reads like Groq-side billing/quota, which strongly suggests a Groq-billed path. (GroqCloud)
  • Groq spend limits reset on the 1st. Rate limits reset by their time window. (GroqCloud)

Hi @adminvohcolab ! We recently migrated to a new billing system. If you’ve used up the $2 in free inference credits that come with your PRO sub (or are about to), you’ll need to add credits to your billing settings to continue using Inference Providers. When adding credits, we recommend enabling automatic recharge to ensure uninterrupted usage and the best possible experience.

If you have other questions about billing, let us know at [email protected] :hugs:

1 Like