Headers report lower rate limits than expected? #176899

rabo-unumed · 2025-10-14T13:37:15Z

rabo-unumed
Oct 14, 2025

Select Topic Area

Question

Body

I also brought it up here, Azure/github-models#22, but thought it might be good to also reach out on the boards since it's more of a general question about rate limits.

I am using gpt-4.1-mini through the Github Models organization inference endpoint. We have activated paid usage.
I keep hitting rate limit errors, and from the header info I can retrieve, it appears I only have 150,000 tokens (per minute, presumably)?

x-ratelimit-limit-tokens': '150000'

I cannot find any mention of that being the official rate limit, neither here https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits?tabs=REST nor here https://docs.github.com/en/github-models/use-github-models/prototyping-with-ai-models#rate-limits

Are the listings outdated? Am I misinterpreting the header limit? Is it the limit for a smaller time unit than a minute?

Tried gpt-5-mini as well, and got a larger rate limit of 'x-ratelimit-limit-tokens': '500000', which does not make me less confused.

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Headers report lower rate limits than expected? #176899

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

This comment was marked as off-topic.

Select a reply

Uh oh!

GitHub Community

Headers report lower rate limits than expected? #176899

Uh oh!

rabo-unumed Oct 14, 2025

Select Topic Area

Body

Replies: 1 comment

This comment was marked as off-topic.

rabo-unumed
Oct 14, 2025