Gpt-oss-20b/120b decreased performance

Hey guys, in the past week we are experiencing a slight decrease in the gpt-oss-20b and 120b model’s performance. We have integrated them and used them for some time now but lately they are acting differently in terms of following the instructions. I can see that the reasoning is focused on only one part of the prompt and it just cant move beyond, which was not case before. Are there any similar experiences out there and has something happened lately that triggered this kind of behaviour?

Thank you for the report, it should be more stable now. Please let me know if you’re still running into problems.

1 Like

Hi @yawnxyz, what was the issue? I’m trying to make sure that the issue was that the model was “dummer”. It was just blindly following first few instructions. Was the fix related to the algorithm performance?

We’ve been quietly improving the GPT-OSS services (will be announcing a new feature soon!) that unfortunately impacted the production OSS models.

Hopefully you’ll see much fewer tool calling and JSON schema / structured output generation errors now though!

1 Like

@yawnxyz > We still see major reduction of performance. The fixes that you mentioned didn’t help. Is there an action plan for that will improve the situation in today/tomorrow?

If not, we need to leave to your competition. Can’t bare this anymore.

Has the situation improved? I’m looking for a high-speed (200 tokens/s+), cache-enabled, and widely rate-limited GPT-OSS 120b model.

We have been experiencing very high latencies and also getting errors:

groq.InternalServerError: Error code: 503 - {‘error’: {‘message’: ‘openai/gpt-oss-120b is currently over capacity. Please try again and back off exponentially. Visit https://groqstatus.com to see if there is an active incident.’, ‘type’: ‘internal_server_error’}}

1 Like

Us too, super high variance and, worse, random 499 errors after 120+ seconds without any reaction from the API. We’re jumping ship to Cerebras now - this has become too unreliable for production usage.

We’re investigating the 499 errors and variance issues now. To help us isolate the cause, please clarify if you’re seeing this on 20b, 120b, or both.

Also, when exactly did you notice this started?

120b (we don’t use 20b, don’t know if it’s affected). The random 499’s started on jan 26th but had a strong uptick 3 days ago. Variance has been on and off.