Headers report lower rate limits than expected? #176899
Unanswered
rabo-unumed
asked this question in
Models
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Select Topic Area
Question
Body
I also brought it up here, Azure/github-models#22, but thought it might be good to also reach out on the boards since it's more of a general question about rate limits.
I am using gpt-4.1-mini through the Github Models organization inference endpoint. We have activated paid usage.
I keep hitting rate limit errors, and from the header info I can retrieve, it appears I only have 150,000 tokens (per minute, presumably)?
x-ratelimit-limit-tokens': '150000'
I cannot find any mention of that being the official rate limit, neither here https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits?tabs=REST nor here https://docs.github.com/en/github-models/use-github-models/prototyping-with-ai-models#rate-limits
Are the listings outdated? Am I misinterpreting the header limit? Is it the limit for a smaller time unit than a minute?
Tried gpt-5-mini as well, and got a larger rate limit of 'x-ratelimit-limit-tokens': '500000', which does not make me less confused.
Beta Was this translation helpful? Give feedback.
All reactions