Requests taking more than 5s for llama 8b instant model in production on developer plan?

yawnxyz August 1, 2025, 4:57pm 37

Thank you for reporting, added these to the issue tracker as well

Topic		Replies	Views
Inquiries about Qwen-32B to be available as a production model Forum	2	96	July 3, 2025
Meta-llama/llama-4-scout-17b-16e-instruct Forum	1	121	December 26, 2025
lama-4-scout-17b-16e-instruct is currently over capacity Forum	0	54	February 26, 2026
Groq latency fluctuates between 300ms and 20s Forum	6	367	October 8, 2025
Gpt-oss-20b/120b decreased performance Forum	9	254	February 25, 2026