Hello,
I would really like to experiment with the 10M token context window of the Llama 4 Scout model. Groq is one of the very few places that offer this model over an API.
However, the 300000 TPM limit seems to defeat the purpose. As far as I understand the API is stateless, so I’d have to have several megatokens in a request to use the context window, and the TPM limit will always relect this request no matter how long I wait?
If there is a solution please tell me so (maybe there *is* a way to have incremental requests that would let me drip-freed the large context at 300000 TPM?)
Alternatively, could it be possible to request a removal of this TPM limit in exchange for a VERY strict RPM limit? I can do with 1 RPM for my idea, which is basically “stuff the entire docset into the context window then ask questions or create code based on the docset”.. I will of course go on the paid tier if this request could be granted. My experiments are not expected to reach large scale anytime soon, though.