Requests taking more than 5s for llama 8b instant model in production on developer plan?

RahulVerma · July 30, 2025, 1:18am

I am seeing huge latency in the requests for llama 8b instant model in production. Anyone else facing this? The status page is not showing any issues.

yawnxyz · July 30, 2025, 5:34pm

Thanks for reporting, we’re looking into this

yawnxyz · July 30, 2025, 6:14pm

Could you please paste your request IDs below so we could trace the errors?

yawnxyz · July 30, 2025, 9:15pm

The issue solved after few hours but here are few request ids:
req_01k1cd3te2ehttgeshnwrcy5js
req_01k1cdg0sxem2tbs7f35ccbza4
req_01k1ccspt1f16trf6fajp3wsa0 req_01k1cct27ceg3r8bj20bkz8ek4

qeeebo · August 1, 2025, 3:18am

I have been having alot of problems with this model in batch mode also. Only just recently it started to behave better but still seems slow, before a batch of 40k prompts was done in 15-25 mins. In the last 2 days it took over 10+ hours for 1 batch.
screenshot of it working abit better now:
req_01k1hsrhjwfsp98s0rqgpnvga1
req_01k1hsrh27e6xsgdx3v8m1pv1r
req_01k1hsrh04em0br1kb5bkcjbq7
req_01k1hsrh01fsk8h9bkxqjpekxn

yawnxyz · August 1, 2025, 4:57am

Thank you for the report, I’ve added your request IDs to the issue. Hopefully their fix will tackle all of these errors

qeeebo · August 1, 2025, 2:26pm

It seems the batches are going through faster now but overnight a few of them stalled and are almost complete like 98% done see below the batch ids.
batch_01k1fc8z97fmhvp4x4n6nnfzp
batch_01k1hj9z66e3y87y2ggpv78qkk
batch_01k1hja89wfne9jbcg32wws47y batch_01k1hjahx0fnftwqtephkvt6nt batch_01k1hjbqnqfr4rrab1azhex0s0 batch_01k1hjbdk1fhzbq626np83rnq6
batch_01k1hjavqge47vw72zne260btc

yawnxyz · August 1, 2025, 4:57pm

Thank you for reporting, added these to the issue tracker as well

qeeebo · September 26, 2025, 9:47pm

Just wanted to share that the improvements made to the groq batch system is remarkable, it is smooth and fast loading, we have not found any latency issues or failed batches! Well done groq team!

yawnxyz · September 26, 2025, 10:29pm

Oh, thank you so much!! We have a lot more speed tweaks coming in, stay tuned!

And thank you for reporting the errors; every error you report makes our system better.

Topic		Replies	Views
Inquiries about Qwen-32B to be available as a production model Forum	2	70	July 3, 2025
Meta-llama/llama-4-scout-17b-16e-instruct Forum	1	71	December 26, 2025
lama-4-scout-17b-16e-instruct is currently over capacity Forum	0	19	February 26, 2026
Groq latency fluctuates between 300ms and 20s Forum	6	298	October 8, 2025
Gpt-oss-20b/120b decreased performance Forum	9	201	February 25, 2026

Requests taking more than 5s for llama 8b instant model in production on developer plan?

Related topics