A possible scenario if you had set your Groq key in Hugging Face:
In your case, the simplest consistent explanation is: your calls are being routed to Groq (implicitly), and you are being blocked by a Groq-side “free monthly usage” gate, not by Hugging Face PRO pay-as-you-go.
That sounds contradictory until you separate three things: routing, billing mode, and provider selection.
1) What the URL proves
Your stack trace includes:
That is the Hugging Face Router, but it is explicitly the Groq route because the path contains So regardless of what you “meant” to do, this particular request ended up as “Groq provider” at the router layer. With Hugging Face documents that auto routing picks the first available provider for that model, following your preference order in Inference Provider settings. (Hugging Face) So your code: means: Auto routing can change “under you” if any of these change: None of that requires code changes. Hugging Face has two billing modes for Inference Providers: Hugging Face’s pricing page explicitly contrasts these two modes. (Hugging Face) Then yes: it does not matter that the backend is Groq or Together or some other provider. It is still billed centrally through HF (credits then pay-as-you-go on PRO). (Hugging Face) Then no: those calls are not counted against HF credits or HF pay-as-you-go at all. They are counted against the provider’s plan and limits. (Hugging Face) Your specific error message (“free monthly usage limit for groq… add pre-paid credits”) strongly resembles provider-billing behavior. In HF-billed mode, you normally see HF messages about monthly included credits, not “add prepaid credits to your Groq account.” (Many public HF threads show HF’s “exceeded monthly included credits” wording instead.) (Hugging Face Forums) So my “most likely” read is: you have Groq configured as a custom provider key or otherwise are being treated as Groq-billed traffic. Even if you intend HF billing, 402 can still happen if HF cannot charge you. Hugging Face staff explicitly say a 402 commonly occurs when: Separately, HF users have reported getting 402 right after exceeding the $2 included credits and asking whether they must “enable permission” for pay-as-you-go. (Hugging Face Forums) So a second plausible failure path is: This path usually pairs with HF’s own “included credits exceeded” messaging, but it is still worth checking because it is common. (Hugging Face Forums) For Inference Providers, HF explicitly requires a fine-grained token with the permission “Make calls to Inference Providers.” (Hugging Face) If you created a new token but did not grant that permission, you can get blocked or get inconsistent behavior. That said, your current error is a billing-style 402, not a 401/403. So token permissions are usually “secondary” here, but still part of a complete checklist. (Hugging Face Forums) Use the HF pricing model as the rule: Practically, you verify by checking: If a Groq key is present, you are likely in provider-billed mode for Groq traffic, and HF pay-as-you-go is not the thing that saves you. (Hugging Face) Auto is convenient, but it hides the root cause. HF documents two clean ways to make routing deterministic: Set provider explicitly at client initialization (default is auto). (Hugging Face) Append routing policy/provider to the model id: If you force a non-Groq provider and you still see If you truly want HF pay-as-you-go to kick in after $2, confirm: You mentioned “when the model doesn’t have the (HF Inference API) provider.” That’s a common confusion: A model not being available on HF’s own backend does not imply “not counted” or “not supported.” It usually just means it will be served by another provider. Whether you pay HF or pay the provider depends entirely on the billing mode described above. (Hugging Face) It depends on which limit you are actually hitting. Groq documents that monthly spend limits reset on the 1st of each month. (GroqCloud) Groq rate limits are defined in time windows (per minute and per day). Groq documents RPM, RPD, TPM, TPD, and that you can view exact limits in your account. (GroqCloud) HF documents monthly credits (Free $0.10, PRO $2.00) and that credits apply when routing through HF. (Hugging Face) Ranked by likelihood given your exact error text and URL:/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fgroq%2Fopenai%2Fv1%2Fchat%2Fcompletions%3C%2Fcode%3E%3C%2Fp%3E
/groq/.
2) Why you got Groq even though you never specified a provider
The default is “auto”
InferenceClient(model_id, token=...) the provider defaults to provider="auto".client = InferenceClient("meta-llama/Llama-3.3-70B-Instruct", token=hf_token)
Why “it worked for months and then broke” fits auto routing
3) The big point: PRO pay-as-you-go only applies in one billing mode
Mode A: Routed by Hugging Face (HF billing)
Mode B: Custom Provider Key (provider billing)
4) So are those models “counted in HF pay-as-you-go”?
If you are in “Routed by Hugging Face”
If you are in “Custom Provider Key”
5) The other common PRO gotcha: pay-as-you-go not actually active
6) The token angle (because you created a new one)
7) How I would debug your exact situation (fast, decisive)
Step 1: Prove whether you are HF-billed or Groq-billed
Step 2: Check whether you configured a Groq custom key in HF settings
Step 3: Stop using implicit auto during debugging
:fastest or :cheapest:<provider> to force a provider (Hugging Face)/groq/ in the failing URL, then you are not actually exercising that path (meaning some other part of your stack is pinned to Groq via base URL).Step 4: Make sure HF can actually charge for pay-as-you-go
8) Your question about “HF Inference API” vs providers
9) When does the “monthly limit” restart?
If the limit is Groq spend limits
If the limit is Groq rate limits
If the limit is HF monthly credits
HF does not state a precise reset timestamp on the pricing table itself, so the practical answer is: check the Inference Providers usage/billing UI for your account’s reset timing.
My “most likely” diagnosis for your case
Summary bullets
InferenceClient(...) without provider uses auto, which picks the first available provider by your HF preference order. (Hugging Face)