Gpt-oss-20b/120b decreased performance

Hey guys, in the past week we are experiencing a slight decrease in the gpt-oss-20b and 120b model’s performance. We have integrated them and used them for some time now but lately they are acting differently in terms of following the instructions. I can see that the reasoning is focused on only one part of the prompt and it just cant move beyond, which was not case before. Are there any similar experiences out there and has something happened lately that triggered this kind of behaviour?

Thank you for the report, it should be more stable now. Please let me know if you’re still running into problems.

1 Like

Hi @yawnxyz, what was the issue? I’m trying to make sure that the issue was that the model was “dummer”. It was just blindly following first few instructions. Was the fix related to the algorithm performance?

We’ve been quietly improving the GPT-OSS services (will be announcing a new feature soon!) that unfortunately impacted the production OSS models.

Hopefully you’ll see much fewer tool calling and JSON schema / structured output generation errors now though!

1 Like

@yawnxyz > We still see major reduction of performance. The fixes that you mentioned didn’t help. Is there an action plan for that will improve the situation in today/tomorrow?

If not, we need to leave to your competition. Can’t bare this anymore.

Has the situation improved? I’m looking for a high-speed (200 tokens/s+), cache-enabled, and widely rate-limited GPT-OSS 120b model.