429 Rate limits with a single tool call

Hi! I am testing tool calling with both gpt oss and kimi. After a simple tool call (no more than 100s of tokens in and out), the second call immediately gets a 429. Waiting anything from 10 to 30 seconds helps get it trough, but I don’t see what limits am I hitting. I would need to do 100s or 1000s of tool calls per minute as I see it.

Any ideas what I might be doing wrong? I am using groq’s own python library, with a 2-4k completion limit (tried both)

Hi there, are you using browser search? With reasoning_effort set to medium or high, it can often search a lot of webpages, filling up the context very quickly and causing a lot of tokens to be used.