Gpt-oss-20b/120b decreased performance

p_urdjanov · December 19, 2025, 3:04pm

Hey guys, in the past week we are experiencing a slight decrease in the gpt-oss-20b and 120b model’s performance. We have integrated them and used them for some time now but lately they are acting differently in terms of following the instructions. I can see that the reasoning is focused on only one part of the prompt and it just cant move beyond, which was not case before. Are there any similar experiences out there and has something happened lately that triggered this kind of behaviour?

yawnxyz · December 22, 2025, 5:48pm

Thank you for the report, it should be more stable now. Please let me know if you’re still running into problems.

jozef · December 22, 2025, 8:30pm

Hi @yawnxyz, what was the issue? I’m trying to make sure that the issue was that the model was “dummer”. It was just blindly following first few instructions. Was the fix related to the algorithm performance?

yawnxyz · December 23, 2025, 10:49pm

We’ve been quietly improving the GPT-OSS services (will be announcing a new feature soon!) that unfortunately impacted the production OSS models.

Hopefully you’ll see much fewer tool calling and JSON schema / structured output generation errors now though!

jozef · December 30, 2025, 11:23am

@yawnxyz > We still see major reduction of performance. The fixes that you mentioned didn’t help. Is there an action plan for that will improve the situation in today/tomorrow?

If not, we need to leave to your competition. Can’t bare this anymore.

God · January 3, 2026, 1:37pm

Has the situation improved? I’m looking for a high-speed (200 tokens/s+), cache-enabled, and widely rate-limited GPT-OSS 120b model.

Developer_Ve · February 24, 2026, 3:51pm

We have been experiencing very high latencies and also getting errors:

groq.InternalServerError: Error code: 503 - {‘error’: {‘message’: ‘openai/gpt-oss-120b is currently over capacity. Please try again and back off exponentially. Visit https://groqstatus.com to see if there is an active incident.’, ‘type’: ‘internal_server_error’}}

ldorigo-learnwise · February 25, 2026, 4:46pm

Us too, super high variance and, worse, random 499 errors after 120+ seconds without any reaction from the API. We’re jumping ship to Cerebras now - this has become too unreliable for production usage.

meshari · February 25, 2026, 9:28pm

We’re investigating the 499 errors and variance issues now. To help us isolate the cause, please clarify if you’re seeing this on 20b, 120b, or both.

Also, when exactly did you notice this started?

ldorigo-learnwise · February 25, 2026, 9:34pm

120b (we don’t use 20b, don’t know if it’s affected). The random 499’s started on jan 26th but had a strong uptick 3 days ago. Variance has been on and off.

Topic		Replies	Views
GPT OSS 120B degraded performance: not following instructions as it did last week Forum	0	95	January 6, 2026
Gpt oss 120b server overloaded Forum	0	24	March 19, 2026
GPT-OSS is currently broken Forum	7	301	December 23, 2025
Gpt oss 120b not reasoning all the time Forum	1	92	January 5, 2026
GPT120B is repeating gibberish agai Forum	3	124	February 18, 2026

Gpt-oss-20b/120b decreased performance

Related topics