I am subscribed to the $9 Pro plan and I am using for generating a synth dataset, so a bit of heavy usage here.
1 Like
I’m also interested in this, as I heavily rely on the Inference API (making 1 request per 10 seconds for 24 hours). I searched the documentation but couldn’t find relevant information.
For reference, here’s the code I use to send requests:
client = AsyncInferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
chat = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi, can I reach the moon by jumping?"},
]
response = await client.chat_completion(chat, max_tokens=100, temperature=0.1)
I run quite a bit of inference and I was charged only the $9 a month, so it seems…
1 Like