Inquiries about Qwen-32B to be available as a production model

Question

Hi all,Our team is building an AI workflow in Japanese. We are prioritizing response time, but using LLaMA-3.1 8B, which is the smallest model in production env, will dramatically dampen the performance. As a result, we used LLaMA-3.3 70B to find a balence between response time and performance.From our experiments, qwen models turn out to excel among all models, but it’s not feasible to use it in production env (suffering from ~50% server error rate). If qwen models become available in production env, we would definitely use them instead of LLaMA models. So is there a clear roadmap about when will these models be available? Thanks in advance.

yawnxyz · Answer

Hi Excalibar!We’ve been constantlytuning and improving the Qwen-32b model, and making it faster and error free. The model should already be able to take large workloads without failing.Please run your evals again and see if you’re still seeing the errors? From our observability, you shouldn’t be seeing error rates like that, but if you still run into those problems, please email me at jzheng@groq.comso I can help diagnose with the engineers.

Reply

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded