Metaβs Llama 3 8B with an 8K context length is now running at approximately 1250 tokens/second per user on the LPU Inference Engine in GroqChat and GroqCloud.
Thanks to our LPU architecture, performance keeps getting better!
We're working on improving tool calling on top of open source models like Llama 3. To help us test an internal preview of an improved model, we're looking for examples where tool calling currently fails when using Groq API.
If you've experienced failures or errors with tool calling, please submit examples using this form. Everyone who contributes a reproducible example will be eligible for early access to the improved tool calling endpoint!
Whisper Large V3 is now live on the GroqCloud Developer Playground and available via Groq API!
Whisper is running on our Groq LPU Inference Engine at a 166x speed factor according to the latest Artificial Analysis speech-to-text benchmark.
We can't wait to see the cool apps you develop using Whisper on Groq!
Scheduled maintenance will begin at 1400 PDT/2100 UDT today (Saturday).
GroqCloud maintenance is planned for:
1400 PDT/2100 UDT on 29-June-2024 (Saturday) to 1700 PDT/0000 UDT on 29-June-2024 (3 hours).
Affected services:
API (inference)
Web interface
While we expect minimal impact overall, API users could experience intermittent errors. Error rates for inference may be elevated during this time and some administrative account operations will be disabled during maintenance.
We will notify the community when maintenance is complete. If you experience errors after the window, please alert us via normal support channels.
Maintenance completed: 1651 PDT
Gemma 2 9B is now live on GroqChat and via Groq API!
We can't wait to see what you build with it. For inspiration, check out this 4th of July trivia game created by GroqChamp @martinbowling using Gemma 2 9B: Infinite Fourth Trivia powered by Groq.
We're excited to introduce our first-ever cohort of GroqChamps, a talented team of developers and leaders who have been helping to shape our Groq developer community!
Our GroqChamps have already been actively contributing with their projects, answering Groq-related questions, and sharing their expertise. Please help us give them a warm welcome!
Ivan (@Ivan): Student at the Munich Center for Digital Sciences and AI, passionate about computers and building exciting projects.
Kendell (@KTibow): Student from the Pacific Northwest, building Groq into personal projects and helping the community.
Harry (@Harry): Tech consultant on the East Coast, enjoys answering questions about Groq API for startups and SMBs.
Mojtaba (@S4mpl3r): Computer Engineering graduate and ML engineer, always up for a chat about CS and ML.
Hossein (@Unclecode): Entrepreneur, investor, and founder of Kidocode, develops small language models and enjoys coffee and Star Wars.
Martin (@martinbowling): Indie hacker and Replit alum, passionate AI developer, excited about Groq's fast inference.
Welcome to the team, GroqChamps! We're thrilled to have you here!
We're excited to announce the release of two new open-source models specifically designed for tool use, available via Groq API and HuggingFace: Llama-3-Groq-70B-Tool-Use and Llama-3-Groq-8B-Tool-Use, built with Meta Llama-3.
These models represent a significant advancement in open-source AI capabilities for tool use/function calling:
- #1 on the Berkeley Function Calling Leaderboard ahead of other open-source and proprietary models
- 1050 tokens per second on 8B and 330 tokens per second on 70B
Kudos to @Ξ»Rick, the Groq team, and collaborators at Glaive!
Read the full blog here to learn how we did it and see the benchmark results.
The Llama3.1 models released this morning are now live on GroqChat and via Groq API!
Llama 3.1 8B, 70B, and 405B are available via GroqChat, and Llama 3.1 8B and 70B are available via Groq API. Early API access to Llama 3.1 405B is currently available to select Groq customers only, with general availability coming soon.
This release marks a huge milestone for open-source models, with Llama 3.1 405B being the largest and most capable open-source model to date, including a 128K context length and built-in tool calling.
Read more about the excitement [here](https://wow.groq.com/now-available-on-groq-the-largest-and-most-capable-openly-available-foundation-model-to-date-llama-3-1-405b/). We can't wait to see what you build with Llama 3.1 on Groq!
We're excited to introduce the beta release of @groqbot, our AI assistant designed to enhance your experience in our community! Built in partnership with Vectorize, @groqbot utilizes the Vectorize Retrieval-Augmented Generation (RAG) pipeline with vector search indexes for both our official docs and help forums to provide accurate responses.
How to use @groqbot:
- Start a thread in #channel
- Use the !ask command and type your question
- Use the !ask command to ask follow-up questions (context in your help thread will be kept)
- Leave a thumbs up on @groqbot's answer if you're satisfied or a thumbs down if you aren't
- If @groqbot isn't able to help, Groqsters, GroqChamps, or other members will offer best effort help!
We're excited for you to try out @groqbot and will be collecting feedback during the beta release. Note that @groqbot isn't a replacement for human conversations, but is a helpful extra resource.
Special thanks to @Aari Vaidya and @DK09876 from Vectorize for building @groqbot from scratch!
Distil-Whisper is now live on GroqCloud and available on GroqChat and via Groq API!
Distil-Whisper is a distilled version of OpenAI's Whisper large-v3 model, designed for faster English speech recognition at a lower cost while maintaining comparable accuracy.
Key Highlights Compared to Whisper large-v3:
- 240x real-time speed factor
- Word Error Rate (WER) within 2.4% on short-form transcriptions
- Robust to noise and reduced hallucination (1.3x fewer instances of repeated 5-gram word duplicates and a 2.1% reduction in insertion error rate)
Check out our official docs for more info and usage examples.
Distil-Whisper is running on our Groq LPU Inference Engine at a 240x real-time speed factor according to the latest Artificial Analysis speech-to-text benchmark. We can't wait to see the cool apps you'll develop!
Our team has launched a new logs page for observability features! You can view it at https://console.groq.com/settings/logs.
Metrics for errors, latency, saturation, and more are in the works. We'd love your feedback on the logs page in #channel.
We hope logs will help with easy observation of metrics (such as TTFT and latency) for your API requests and allow for easier debugging of any errors by request_id
. This is just the beginning - more to come!
Also, huge shoutout to GroqChamps and everyone who makes this community such a collaborative and entertaining place to be! In honor of 20K, we're doing a Groq swag giveaway you can enter below!
LLaVA is now available for preview via Groq Console Playground and Groq API!
LLaVA Preview
- Model ID:
llava-v1.5-7b-4096-preview
- Description: Our LLaVA V1.5 7B preview release is a large multimodal model that combines a vision encoder and Vicuna for visual and language understanding. See the model card here.
- Context Window: 4,096 tokens
Get Started with LLaVA Preview
- Docs: https://console.groq.com/docs/vision
- Groq API Cookbook Tutorial: https://github.com/groq/groq-api-cookbook/tree/main/tutorials/llava-image-processing
With LLaVA joining our lineup, Groq now supports text, speech, and vision. You can leverage Groq API to process and understand data in all three modalities with lightning-fast inference speed, unlocking a wide range of apps and use cases!
We're always looking to see what you buildβshare in #channel, and remember you can create tutorials/guides and submit a PR for our Groq API Cookbook for the community to learn from!
@groqbot is now officially online again!
We'd appreciate any and all feedback in #channel to help us improve. Many of you have also asked about open-sourcing the code so you can contribute, which we plan to do in the future!
The new Meta Llama 3.2 models are now available via GroqCloud and Groq API!
You can try them out at console.groq.com/playground for:
llama-3.2-1b-preview
llama-3.2-3b-preview
llama-3.2-11b-text-preview
llama-3.2-11b-vision-preview
(vision available soon)llama-3.2-90b-text-preview
Why is this exciting?
- Llama 3.2 3B matches Llama 3.1 8B on IFEval (great for on-device RAG or Agents)
- Vision models competitive with Claude 3 Haiku and GPT4-mini on image tasks (supports multi-turn conversations about images, tool use, and JSON mode)
- 3B model outperforms Gemma 2 2.6B and Phi 3.5-mini on various NLP tasks
Groq's partnership with Meta is driving the open-source AI ecosystem forward. We're proud to help make fast inference available for your apps and push the boundaries of what's possible in AI. Happy building!
llama-3.2-11b-vision-preview
is now available via GroqCloud and Groq API!
See docs here: https://console.groq.com/docs/vision
See multimodal tutorials for tool use and JSON mode here: https://github.com/groq/groq-api-cookbook/tree/main/tutorials/multimodal-image-processing
We're excited to see what you build with it and will be checking #channel to reach out to some of you for a feature on our applications showcase in our official docs.
We're always looking for new contributions to our Groq API Cookbook! Check out our contribution guidelines and feel free to submit a PR for a guide or tutorial to benefit the community.
Happy Friday! Here are three updates:
1) JigsawStack released their Prompt Engine feature powered by Groq. Learn more in their announcement thread.
Docs: https://console.groq.com/docs/jigsawstack
Blog: https://jigsawstack.com/blog/jigsawstack-mixture-of-agents-moa-outperform-any-single-llm-and-reduce-cost-with-prompt-engine
2) We now support assistant
message prefilling! This is a hidden gem, and many have asked for it. More content and use cases coming soon. In the meantime, see our docs: https://console.groq.com/docs/prompting
3) @kraken and team are working on improving the look and feel for a better experience. Check out the new updates in the attached video and please share any feedback in #channel.
Wednesdays are for Whisper...
whisper-large-v3-turbo
is now available on Groq!
Whisper large-v3-turbo is a fine-tuned version of a pruned Whisper large-v3 by OpenAI with decoding layers reduced from 32 to 4 for increased speed. Running on Groq, we're turbo-charging turbo for even more speed. See the benchmark for details.
Key Highlights:
- Great balance of speed, quality, and capabilities
- 216x real-time speed factor, faster than Whisper Large v3 while maintaining multilingual capabilities
- 1% lower Word Error Rate (WER) than Distil-Whisper
Llama 3.1 70B with Speculative Decoding (llama-3.1-70b-specdec
) is now available for paying customers!
We've achieved a ~6x performance boost (~250 T/s β ~1660 T/s) for Llama 3.1 70B on our first-gen 14nm LPU through speculative decoding, which uses a smaller draft model to generate tokens for verification by the primary model. We're nowhere near our first-gen silicon's limits and can't wait to unveil our 4nm silicon!
Details:
- Speed: ~1,665 tokens/second (independently verified by Artificial Analysis with zero quality degradation)
- Context window: 8,192 tokens
- Pricing: $0.59/M input tokens, $0.99/M output tokens
If you're looking to be part of the pay-per-token Developer Tier, contact us with your organization ID from account settings and we'll onboard you. Please be patient as we process requests!
META'S LLAMA 3.3 70B is now available on Groq!
We just dropped two new model IDs for all users:
- llama-3.3-70b-versatile
- llama-3.3-70b-specdec (speed demon edition)
Why is this a big deal?
We're getting 405B-level performance from a 70B model! Here are a few highlights from the benchmark improvements:
- IFEval: 92.1 (up from 87.5, beats 405B's 88.6!)
- HumanEval: 88.4 (up from 80.5)
- MATH: 77.0 (up from 68.0, beats GPT-4o!)
What's new?
Expect better long-context handling, enhanced JSON function calling, more reliable parameter handling, improved chain-of-thought reasoning, and broader programming language support. Learn more in our blog post.
Ready to code?
Check out Ben's open-source projects already running on Llama 3.3 70B:
- Project 1: infinite-bookshelf
- Project 2: g1
Try it out and let us know what you think!
We're seeing lots of excitement around our pay-per-token Developer Tier! To get onboarded:
- Go to Groq Console
- Click on "Chat with us" on the bottom left
- Click Send us a message β Account β Dev Tier
- Send us a message with your Organization ID (and any feedback for our team!)
Our team is ready to get you onboarded! Let's build incredible things together with higher rate limits on Dev Tier. The demos and use cases you share blow us away every day. Can't wait to see what you create next!
It's been a busy couple of months as we focus on making your tokens go BRRR! Here's a proper update of everything we've shipped:
Pay-Per-Token Developer Tier
We're live with our pay-per-token Developer Tier with higher rate limits, global billing support, batch processing, and flex processing. Sign up here.
If you need even higher rate limits, please contact us here.
Smart Models
The following models are now available on GroqCloud and via Groq API:
qwen-2.5-32b
: Great for coding, math, tool use, instruction-following, and structured outputs (especially JSON), and outperforms GPT4-o mini on some benchmarks. Learn more.deepseek-r1-distill-qwen-32b
: Excels at reasoning and outperforms top models in various benchmarks. Learn more.deepseek-r1-distill-llama-70b-specdec
: The fastest reasoning model in the world thanks to our inference speed. Learn more.deepseek-r1-distill-llama-70b
: Excels at reasoning and outperforms base Llama. Learn more about reasoning models.
See all available models in our docs.
Whisper Enhancements
- Whisper Large v3 is now 67% faster
- Whisper Large v3 audio file limit is now 100MB (up from 40MB)
- Audio chunking tutorial for handling large files
We're Hiring for DevRel
If you love building, teaching others, and connecting with developers, we're doing genuinely interesting work with AI inference. See the job description and other openings here.
What's Next for Groq?
- We built the region's largest inference cluster in Saudi Arabia in 51 days and announced a $1.5B agreement for Groq to expand. Read more.
- Text-to-Speech is coming very soon!
- More building and shipping!
Your feedback directly shapes what we build. Always leave your thoughts, feature requests, or just say hiβwe love hearing from you!
Qwen/Qwen2.5-Coder-32B-Instruct is now live on GroqCloud and via Groq API for fast and smart code generation.
Qwen2.5 Coder is state-of-the-art for open-source code generation, beating GPT-4o and Claude 3.5 Sonnet on several benchmarks. It's also a game-changer for debugging workflows.
If you use Cursor as your IDE, here's a quick setup for qwen-2.5-coder-32b
:
- Go to Cursor Settings
- Go to Models
- Override the OpenAI Base URL with "https://api.groq.com/openai/v1"
- Paste your Groq API Key in the OpenAI API Key field
- Click "Add Model" and add
qwen-2.5-coder-32b
Now you have a smart and fast code assistant at your service. Explore for free and let us know what you think!
We're Hiring for Developer Support
If you are passionate about AI and love helping others solve problems and build AI applications, we have new full and part-time Developer Support Specialist roles available now. If you have experience in developer support or just a passion for AI, come help us help developers build on Groq! It's a great opportunity to work directly with Groq engineers and be a part of this.
See the attachment for more info. Many other exciting job openings here.
Developer_Support_Specialist.pdf
Reply
Login to the community
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.