Skip to main content
Here's Groq product, feature, and model announcements from the Groq community.

Mistral Saba 24B is now live on GroqCloud and via Groq API for everyone!

Mistral Saba 24B was trained for native comprehension of Arabic and regional languages, with lightning-fast inference hosted from our Saudi Arabia cluster. For developers in MENA and South Asia, this means even lower latency and more natural interactions for local users.

Saba Quick Specs:

  • Text → Text (with tool use support)
  • 32K context window
  • October 2024 training cutoff
  • Recognizes cultural connections across regions

Saba handles linguistic nuances better than other models in specialized tasks like Arabic TyDiQAGoldP and Alghafa benchmarks. If you're a native speaker of Arabic or other regional languages, we'd love your real-world feedback!

In case you missed it, our Developer Tier is now live and it takes just a few minutes to upgrade for more tokens. Happy building.


Groq Console just got a huge refresh!

We now have:

  • Chat → Studio for serious prompting
  • Unified Dashboard for metrics, logs, batch jobs, limits
  • Easily accessible API keys and docs

Next up is a new home for the landing page. What would you like to see on it? We're still debating the details and would love your input and feedback for what we have so far in #channel.


Appreciate the love!


Today is Wednesday... otherwise known as Qwendnesday. Drum roll, please!

Alibaba Qwen's QwQ-32B (model ID = qwen-qwq-32b) is now live on GroqCloud and via Groq API for the fastest reasoning in the world!

Models are getting smaller and smarter—DeepSeek-R1 (671B) surprised us all just a couple of months ago and now we have a 20x smaller, mightier model rivaling its performance.

QwQ-32B is matching or beating DeepSeek-R1 and o1-mini across key benchmarks, while using only ~5% of the parameters. This means lower inference costs without sacrificing quality or reasoning capability.

The Qwen team has accomplished a lot with reinforcement learning (RL), showing you don't need massive compute or MoE architectures. RL on a strong base model is all you need to unlock reasoning capabilities and enhanced performance.

This is especially exciting for AI agent builders—QwQ-32B was explicitly designed for tool use and adapting its reasoning based on environmental feedback. Let us know what you build with it!


Groq API now supports word-level timestamps for transcriptions!

This has been one of our most requested features and is now available for all Whisper models (whisper-large-v3, whisper-large-v3-turbo, distil-whisper-large-v3-en).

How to implement word-level timestamps:

  • Set response_format to "verbose_json"
  • Add timestamp_granularities: ["word"] to your request
  • Groq API will return precise start/end timings for each word in your transcript

This feature enables interactive transcript navigation, precise subtitle generation, searchable audio content, and word-level precision.

We've also updated our audio chunking tutorial to include support for segment and/or word timestamp granularities for long audio files. Check it out: audio_chunking_tutorial.ipynb

Did you notice you can now easily copy model IDs to your clipboard from the models page?


BIG NEWS: The Groq + Vercel integration is live!

Connect your Vercel projects directly to GroqCloud for ultra-fast AI inference. Build fast, deploy easily, and get low-latency access to state-of-the-art models.

Try it now: [https://vercel.com/integrations/groq](https://vercel.com/integrations/groq)

Read more in our blog: [https://groq.com/groq-vercel-partner-to-make-building-fast-and-simple/](https://groq.com/groq-vercel-partner-to-make-building-fast-and-simple/)


We now have text-to-speech models available for everyone on Groq for fast speech generation in both English and Arabic!

We already support speech-to-text, so this enables end-to-end voice agents.

Docs and code snippets: https://console.groq.com/docs/text-to-speech

Try it out and let us know what you think. We're working on additional features such as word-level timestamps for TTS (already available for STT) next, but want to hear from you about what else to prioritize. As always, build fast!

Demo video


LLAMA 4 is now available on Groq!

Fast? Yes. Free Tier? Yes. Lowest price on Dev Tier for higher limits? Yes. Upgrade for $0.11/0.34 per million of input/output.

As always, build fast and have fun!

Demo video


We've launched our own agent, Compound!

One API call combines web search and code running, so you can do things like:

  • Check the weather
  • Get the latest stock prices
  • Graph bitcoin prices over time

Try it out in Chat: https://chat.groq.com

Use it in the API: just replace the model names with compound-beta or compound-beta-mini. See the docs: https://console.groq.com/docs/agentic-tooling

This is still in beta, so please leave us feedback in the community or through the feedback form.

Example usage (curl):

curl https://api.groq.com/openai/v1/chat/completions -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-d '{
"model": "compound-beta",
"messages": [{
"role": "user",
"content": "What is the current weather in Tokyo?"
}]
}'

Example usage (Python):

from groq import Groq

client = Groq()

completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What is the current weather in Tokyo?",
}
],
model="compound-beta",
)

print(completion.choices[0].message.content)
# Print all tool calls
# print(completion.choices[0].message.tool_calls)

Reply