Free monthly limit reached

A possible scenario if you had set your Groq key in Hugging Face:


In your case, the simplest consistent explanation is: your calls are being routed to Groq (implicitly), and you are being blocked by a Groq-side “free monthly usage” gate, not by Hugging Face PRO pay-as-you-go.

That sounds contradictory until you separate three things: routing, billing mode, and provider selection.


1) What the URL proves

Your stack trace includes:

/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fgroq%2Fopenai%2Fv1%2Fchat%2Fcompletions%3C%2Fcode%3E%3C%2Fp%3E

That is the Hugging Face Router, but it is explicitly the Groq route because the path contains /groq/.

So regardless of what you “meant” to do, this particular request ended up as “Groq provider” at the router layer.


2) Why you got Groq even though you never specified a provider

The default is “auto”

With InferenceClient(model_id, token=...) the provider defaults to provider="auto".

Hugging Face documents that auto routing picks the first available provider for that model, following your preference order in Inference Provider settings. (Hugging Face)

So your code:

client = InferenceClient("meta-llama/Llama-3.3-70B-Instruct", token=hf_token)

means:

  • “Pick whichever provider is first in my HF provider preference list that can serve this model.”
  • If that provider becomes unavailable or can’t serve the model, auto can effectively shift to another provider (failover behavior is part of why “auto” exists). (Hugging Face)

Why “it worked for months and then broke” fits auto routing

Auto routing can change “under you” if any of these change:

  • You changed provider preference order in HF settings.
  • You added a provider API key in settings.
  • A provider’s availability for a model changed, so the “first available” provider changed.

None of that requires code changes.


3) The big point: PRO pay-as-you-go only applies in one billing mode

Hugging Face has two billing modes for Inference Providers:

Mode A: Routed by Hugging Face (HF billing)

  • HF routes to the provider.
  • Billing is on your HF account.
  • Your monthly credits apply.
  • PRO supports extra usage pay-as-you-go. (Hugging Face)

Mode B: Custom Provider Key (provider billing)

  • You put your provider API key (Groq key, Together key, etc.) into HF settings.
  • HF still routes, but billing is directly on the provider.
  • HF monthly credits do not apply.
  • You need a provider account. (Hugging Face)

Hugging Face’s pricing page explicitly contrasts these two modes. (Hugging Face)


4) So are those models “counted in HF pay-as-you-go”?

If you are in “Routed by Hugging Face”

Then yes: it does not matter that the backend is Groq or Together or some other provider. It is still billed centrally through HF (credits then pay-as-you-go on PRO). (Hugging Face)

If you are in “Custom Provider Key”

Then no: those calls are not counted against HF credits or HF pay-as-you-go at all. They are counted against the provider’s plan and limits. (Hugging Face)

Your specific error message (“free monthly usage limit for groq… add pre-paid credits”) strongly resembles provider-billing behavior. In HF-billed mode, you normally see HF messages about monthly included credits, not “add prepaid credits to your Groq account.” (Many public HF threads show HF’s “exceeded monthly included credits” wording instead.) (Hugging Face Forums)

So my “most likely” read is: you have Groq configured as a custom provider key or otherwise are being treated as Groq-billed traffic.


5) The other common PRO gotcha: pay-as-you-go not actually active

Even if you intend HF billing, 402 can still happen if HF cannot charge you.

Hugging Face staff explicitly say a 402 commonly occurs when:

  • there is no payment method on the account, or
  • the token lacks the right permissions. (Hugging Face Forums)

Separately, HF users have reported getting 402 right after exceeding the $2 included credits and asking whether they must “enable permission” for pay-as-you-go. (Hugging Face Forums)

So a second plausible failure path is:

  • you burned through the $2 included credits
  • pay-as-you-go isn’t actually chargeable due to billing setup
  • HF blocks with 402

This path usually pairs with HF’s own “included credits exceeded” messaging, but it is still worth checking because it is common. (Hugging Face Forums)


6) The token angle (because you created a new one)

For Inference Providers, HF explicitly requires a fine-grained token with the permission “Make calls to Inference Providers.” (Hugging Face)

If you created a new token but did not grant that permission, you can get blocked or get inconsistent behavior.

That said, your current error is a billing-style 402, not a 401/403. So token permissions are usually “secondary” here, but still part of a complete checklist. (Hugging Face Forums)


7) How I would debug your exact situation (fast, decisive)

Step 1: Prove whether you are HF-billed or Groq-billed

Use the HF pricing model as the rule:

  • If you are HF-routed, your monthly credits apply and extra usage goes to HF pay-as-you-go (on PRO). (Hugging Face)
  • If you are using a custom provider key, credits do not apply and the provider bills you. (Hugging Face)

Practically, you verify by checking:

  • HF Inference Providers usage breakdown and whether your calls are recorded as HF-billed usage (HF provides usage breakdown tooling). (Hugging Face)
  • Groq Console usage/limits if you suspect Groq-billed traffic (Groq rate/limit concepts are documented). (GroqCloud)

Step 2: Check whether you configured a Groq custom key in HF settings

If a Groq key is present, you are likely in provider-billed mode for Groq traffic, and HF pay-as-you-go is not the thing that saves you. (Hugging Face)

Step 3: Stop using implicit auto during debugging

Auto is convenient, but it hides the root cause.

HF documents two clean ways to make routing deterministic:

  1. Set provider explicitly at client initialization (default is auto). (Hugging Face)

  2. Append routing policy/provider to the model id:

    • :fastest or :cheapest
    • or :<provider> to force a provider (Hugging Face)

If you force a non-Groq provider and you still see /groq/ in the failing URL, then you are not actually exercising that path (meaning some other part of your stack is pinned to Groq via base URL).

Step 4: Make sure HF can actually charge for pay-as-you-go

If you truly want HF pay-as-you-go to kick in after $2, confirm:

  • You have a payment method set.
  • Billing can generate invoices. HF says invoices are issued on the 1st for accrued usage (compute billing context, but it confirms how HF billing operates). (Hugging Face)

8) Your question about “HF Inference API” vs providers

You mentioned “when the model doesn’t have the (HF Inference API) provider.”

That’s a common confusion:

  • “HF Inference API / hf-inference” is one backend.
  • Inference Providers is a routing layer that can send requests to multiple third-party backends.

A model not being available on HF’s own backend does not imply “not counted” or “not supported.” It usually just means it will be served by another provider. Whether you pay HF or pay the provider depends entirely on the billing mode described above. (Hugging Face)


9) When does the “monthly limit” restart?

It depends on which limit you are actually hitting.

If the limit is Groq spend limits

Groq documents that monthly spend limits reset on the 1st of each month. (GroqCloud)

If the limit is Groq rate limits

Groq rate limits are defined in time windows (per minute and per day). Groq documents RPM, RPD, TPM, TPD, and that you can view exact limits in your account. (GroqCloud)

If the limit is HF monthly credits

HF documents monthly credits (Free $0.10, PRO $2.00) and that credits apply when routing through HF. (Hugging Face)
HF does not state a precise reset timestamp on the pricing table itself, so the practical answer is: check the Inference Providers usage/billing UI for your account’s reset timing.


My “most likely” diagnosis for your case

Ranked by likelihood given your exact error text and URL:

  1. You are effectively in Groq-billed mode (Custom Provider Key), and Groq free monthly allowance is exhausted. HF PRO is irrelevant in that mode. (Hugging Face)
  2. Auto routing picked Groq (because Groq is first available by your preference order), and you hit a Groq-side cap. This can happen even without a code change. (Hugging Face)
  3. HF pay-as-you-go is not chargeable (payment method or permissions), so HF blocks with 402 after credits. This is common, but the wording in your message points more strongly to (1). (Hugging Face Forums)

Summary bullets

  • InferenceClient(...) without provider uses auto, which picks the first available provider by your HF preference order. (Hugging Face)
  • HF PRO pay-as-you-go only applies when requests are Routed by Hugging Face. Custom provider keys bypass HF billing. (Hugging Face)
  • Your error text reads like Groq-side billing/quota, which strongly suggests a Groq-billed path. (GroqCloud)
  • Groq spend limits reset on the 1st. Rate limits reset by their time window. (GroqCloud)