deepseek new model, which can switch bewteen think/no-think is convenient
Wait, are y’all talking about this model? deepseek-coder-v2
Id love if you hosted some of the service now AI’s. Like Apriel v1.6 15b Thinker. I think it would be an excellent model to provide.
Yea, if its possible cant we get the 236b one (i think) since that one is crazy for autonomous coding. I want it so bad xD
no i’m saying deepseek v3.1 or v3.2. deepseek v2 series is the old version
And I also have some suggestions that I can discuss with the community members. First of all, I think gpt-oss is a model that performs well in testing but is quite average in actual use. Its code has many errors that are hard to bear looking at, and there are very strict content policy restrictions that affect the actual experience. Maybe we can take down these models, free up server resources, and put up those models that are really useful. I also recommend a model called glm-4.6. Its previous generation, glm4.5, has good reviews for its coding ability, which can be on par with Claude Sonnet4, and its cost is not high.
We’ve been evaluating the lot - OSS is a decently fast model that balances code writing / tool use / data extraction, that many of us use day to day! It’s definitely more corporate leaning but they’re still very popular!
v2 coding not v2 general xD
They are good, not coding specified tho.
Im using OSS right now for my AI coding project, and its good. Only thing is, i want a model thats specified for coding since it has fewer errors, better syntax, cleaner code (you get it).
Anyways, thanks for listening to the community ;=)
(sorry if i replied several times i got so confused with the system
)
Yeah Discourse takes a bit to get used to. We’re cooking a coding model, announcements soon!
All the models on Groq suck for building apps, besides maybe Kimi K2 as a proof of concept…
Gemini 3 Flash just launched, Opus 4.5 and Composer-1 from Cursor are way too good for coding stop wasting your time on coding the war is lost.
They all use Google’s TPUs, so the tps is fast enough now, possibly faster than groq.
Where inference is slow and users are willing to pay today are in images and video inference. Open-source models there are amazing, and LoRa’s for styles, skin fixing, upscaling, and all sorts of stuff are taking off.
If you added an image model like FLUX 2 pro and it was fast, a video model like Wan 2.2-2.6, a video editing model like SCAIL… Now that would be an amazing API as a dev.
Opus, Sonnet, and Composer are still some of the best models for writing code; Composer-1 is a fine-tuned open source model optimized for Cursor and works really well in that environment.
They don’t all use TPUs exclusively — Anthropic does use a few TPU clusters but it’s to beef up their overall availability (it’s actually quite interesting how they have to juggle the differences between GPUs and TPUs)
For coding, open source models definitely falls behind Opus and Sonnet, but makes it up for price and speed — so if you’re doing stuff like scanning for vulnerabilities, Opus/Sonnet will very quickly burn a hole in your pocket (Composer can’t be used via API) — that’s where open source models can step in.
Viddeo and image model gen is interesting, right now from a business perspective it’s not on our radar though (we’re mostly going for big “boring” business use cases right now)
Z-image as the image model?
, hope it goes well
Would love to see following models (it might be a long list):
- Text-to-Speech: Chatterbox Turbo, Kokoro-82M, VibeVoice (0.5B, 1.5B, Large)
- Speech-to-Text: zai-org/GLM-ASR-Nano-2512, nvidia/parakeet-tdt-0.6b-v3
Some Text-to-Image models as well
Please Please Please add GLM 4.7, I love groq everything is awesome but there is a lack of capable model here. OSS-120B is a good general model but it doesn’t match quality of GLM models. If you are testing it, please make it experimental so we can experiment with it too.
Some Embedding models please.. I keep having to get multiple subscriptions it would be nice if could just stick to groq
Could you please consider supporting this open-source model:
https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash
MiMo-V2-Flash uses a GPT-OSS–style decoder with hybrid attention (5 SWA layers per full-attention layer), which significantly reduces KV-cache size and improves decoding speed. The model shows competitive overall quality (https://artificialanalysis.ai/ , MiMo-V2-Flash (free) - API, Providers, Stats | OpenRouter ).
With MTP enabled on GPU, it already reaches ~150 tokens/s. Groq definitely can achieve much higher speed! We believe Groq can also benefit from MTP and MiMo-V2-Flash would be a strong model to showcase Groq’s inference throughput advantages.
Qwen3 VL family of models (text generation, embeddings, rerankers)
Seedream (Image gen)