How to use a 10M context window? Rate limit issue

Question

Hello,I would really like to experiment with the 10M token context window of the Llama 4 Scout model. Groq is one of the very few places that offer this model over an API.However, the 300000 TPM limit seems to defeat the purpose. As far as I understand the API is stateless, so I’d have to have several megatokens in a request to use the context window, and the TPM limit will always relect this request no matter how long I wait?If there is a solution please tell me so (maybe there *is* a way to have incremental requests that would let me drip-freed the large context at 300000 TPM?)Alternatively, could it be possible to request a removal of this TPM limit in exchange for a VERY strict RPM limit? I can do with 1 RPM for my idea, which is basically “stuff the entire docset into the context window then ask questions or create code based on the docset”.. I will of course go on the paid tier if this request could be granted. My experiments are not expected to reach large scale anytime soon, though.

KTibow · Answer

Unfortunately, Llama Scout currently only runs with a 131k token context window. In fact, I don’t believe any provider runs it with the whole 10M context window.If you’d still like to get more full context window requests in per minute, consider using"service_tier": "flex"or requesting a rate limit increase from Chat With Us.

Reply

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded