Just got the email that Kimi K2-0905 is being deprecated April 15th. This is becoming a pattern.
Since the Nvidia acquisition, it feels like Groq has been in a cycle of pulling models one by one - with no compelling replacements lined up. The suggested migration path to GPT OSS 120B is not equivalent, and “check our model list” isn’t a transition plan.
The entire value proposition of Groq was speed + reliability. The speed is still unmatched, but what good is fast inference if you can’t count on a model being available 6 months from now?
Nobody wants to re-engineer their prompts, evals, and pipelines every quarter because the underlying model vanished.
For those of us running production workloads on Groq: what alternatives are you looking at?
Specifically interested in providers that can come close to Groq’s latency while offering more model stability. OpenRouter is the obvious one but throughput and latency aren’t in the same league.
A few things I’d want from any alternative:
- Sub-second TTFB on mid-size models
- Commitment to model availability windows (minimum 12 months)
- Transparent deprecation policy with adequate migration timelines
And to the Groq team, if you’re reading: your hardware is incredible. But the model churn is making it impossible to recommend Groq for anything beyond prototyping. A public model lifecycle policy would go a long way.