Mistral Saba 24B is now live on GroqCloud and via Groq API for everyone!
Mistral Saba 24B was trained for native comprehension of Arabic and regional languages, with lightning-fast inference hosted from our Saudi Arabia cluster. For developers in MENA and South Asia, this means even lower latency and more natural interactions for local users.
Saba Quick Specs:
- Text → Text (with tool use support)
- 32K context window
- October 2024 training cutoff
- Recognizes cultural connections across regions
Saba handles linguistic nuances better than other models in specialized tasks like Arabic TyDiQAGoldP and Alghafa benchmarks. If you're a native speaker of Arabic or other regional languages, we'd love your real-world feedback!
In case you missed it, our Developer Tier is now live and it takes just a few minutes to upgrade for more tokens. Happy building.
Groq Console just got a huge refresh!
We now have:
- Chat → Studio for serious prompting
- Unified Dashboard for metrics, logs, batch jobs, limits
- Easily accessible API keys and docs
Next up is a new home for the landing page. What would you like to see on it? We're still debating the details and would love your input and feedback for what we have so far in #channel.
Appreciate the love!
Today is Wednesday... otherwise known as Qwendnesday. Drum roll, please!
Alibaba Qwen's QwQ-32B (model ID = qwen-qwq-32b
) is now live on GroqCloud and via Groq API for the fastest reasoning in the world!
Models are getting smaller and smarter—DeepSeek-R1 (671B) surprised us all just a couple of months ago and now we have a 20x smaller, mightier model rivaling its performance.
QwQ-32B is matching or beating DeepSeek-R1 and o1-mini across key benchmarks, while using only ~5% of the parameters. This means lower inference costs without sacrificing quality or reasoning capability.
The Qwen team has accomplished a lot with reinforcement learning (RL), showing you don't need massive compute or MoE architectures. RL on a strong base model is all you need to unlock reasoning capabilities and enhanced performance.
This is especially exciting for AI agent builders—QwQ-32B was explicitly designed for tool use and adapting its reasoning based on environmental feedback. Let us know what you build with it!
Groq API now supports word-level timestamps for transcriptions!
This has been one of our most requested features and is now available for all Whisper models (whisper-large-v3
, whisper-large-v3-turbo
, distil-whisper-large-v3-en
).
How to implement word-level timestamps:
- Set
response_format
to"verbose_json"
- Add
timestamp_granularities: ["word"]
to your request - Groq API will return precise start/end timings for each word in your transcript
This feature enables interactive transcript navigation, precise subtitle generation, searchable audio content, and word-level precision.
We've also updated our audio chunking tutorial to include support for segment and/or word timestamp granularities for long audio files. Check it out: audio_chunking_tutorial.ipynb
Did you notice you can now easily copy model IDs to your clipboard from the models page?
BIG NEWS: The Groq + Vercel integration is live!
Connect your Vercel projects directly to GroqCloud for ultra-fast AI inference. Build fast, deploy easily, and get low-latency access to state-of-the-art models.
Try it now: [https://vercel.com/integrations/groq](https://vercel.com/integrations/groq)
Read more in our blog: [https://groq.com/groq-vercel-partner-to-make-building-fast-and-simple/](https://groq.com/groq-vercel-partner-to-make-building-fast-and-simple/)
We now have text-to-speech models available for everyone on Groq for fast speech generation in both English and Arabic!
We already support speech-to-text, so this enables end-to-end voice agents.
Docs and code snippets: https://console.groq.com/docs/text-to-speech
Try it out and let us know what you think. We're working on additional features such as word-level timestamps for TTS (already available for STT) next, but want to hear from you about what else to prioritize. As always, build fast!
LLAMA 4 is now available on Groq!
Fast? Yes. Free Tier? Yes. Lowest price on Dev Tier for higher limits? Yes. Upgrade for $0.11/0.34 per million of input/output.
- Llama 4 on Groq Console Playground
- Vision docs with code snippets for JSON mode, local image upload, URL image upload, multi-turn, etc.
- Llama 4 blog post
As always, build fast and have fun!
We've launched our own agent, Compound!
One API call combines web search and code running, so you can do things like:
- Check the weather
- Get the latest stock prices
- Graph bitcoin prices over time
Try it out in Chat: https://chat.groq.com
Use it in the API: just replace the model names with compound-beta
or compound-beta-mini
. See the docs: https://console.groq.com/docs/agentic-tooling
This is still in beta, so please leave us feedback in the community or through the feedback form.
Example usage (curl):
curl https://api.groq.com/openai/v1/chat/completions -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-d '{
"model": "compound-beta",
"messages": [{
"role": "user",
"content": "What is the current weather in Tokyo?"
}]
}'
Example usage (Python):
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What is the current weather in Tokyo?",
}
],
model="compound-beta",
)
print(completion.choices[0].message.content)
# Print all tool calls
# print(completion.choices[0].message.tool_calls)
Reply
Login to the community
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.