I’m trying to reproduce the error with this, but I can’t get it to spit out gibberish; are you still seeing errors on your end?
Oh, the other thing we added is prompt caching; so if the FIRST request spits out gibberish and you try the same exact prompt again, the gibberish might be cached.
Bust the cache by adding a timestamp or random value at the beginning of the message
curl --request POST \
--url https://api.groq.com/openai/v1/chat/completions \
--header 'authorization: Bearer ID' \
--header 'content-type: application/json' \
--data '{
"messages": [
{
"role": "user",
"content": "Extract structured data from: '\''John Doe, age 30, lives in New York'\''"
}
],
"model": "openai/gpt-oss-120b",
"temperature": 1,
"max_completion_tokens": 8192,
"top_p": 1,
"stream": false,
"stop": null,
"reasoning_format": "hidden",
"disable_tool_validation": true,
"tools": [
{
"type": "function",
"function": {
"name": "get_json_from_data",
"description": "Extract and return structured JSON data from a short string of unstructured text",
"parameters": {
"type": "object",
"properties": {
"data": {
"type": "string",
"description": "The raw text data to parse and extract information from"
},
"schema": {
"type": "object",
"description": "The expected JSON schema structure to extract",
"properties": {
"type": {
"type": "string",
"enum": ["object"]
},
"properties": {
"type": "object",
"description": "Field definitions for the extracted data"
},
"required": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of required fields"
}
}
}
},
"required": [
"data",
"schema"
]
}
}
}
]
}'
The issue occurs when using tool calling with 3 or more tools, where the model reasons for a bit. It’s not easily reproducible. The pattern involves content, then tool use case. Unfortunately, I cannot provide the actual data.
Happens for me when the model has to reason a lot and then it forgets to send a response. If the user types ? nothing but if I read the reasoning token and use the verb mentioned to respond to the user, it works. Bizzaro bug. The other option Im considering is to provide a scratchpad but there is patch not a fix.
If you find yourselves with more than 3-4 tools, and really large contexts, it always helps to break large tasks into smaller tasks; instead of “make thanksgiving dinner plus all the potatoes and turkey and for each dish you do this and that”…
Try breaking the massive tasks into much smaller tasks, like “make the potatoes” and “make the turkey” — each will have its own specific tasks and context — this decomposition of the main tasks into smaller units help you get way better definition of what jobs you want to accomplish, you can test each of them separately / swap out system prompts and tools separately for A/B testing, plus it has the added effect of not overloading the model with too much stuff to do.
For Scratchpad — that seems to work on Claude Sonnet 4.5, but it slows it down by a lot and increases costs by a lot. I’d recommend trying the “break tasks down into the smallest unit that doesn’t break” strategy first.
I’m working on a real-time assistant right now that ends up using a bunch of tools:
I’ll create a “router” agent that generally chooses what classes of tools need to be used (e.g. Salesforce, Web search) and then I’ll have it generate an array of commands (basically JSON configs) that it’ll send to other functions/classes, e.g. a Salesforce function, that will agentically figure out what needs to be done for that task.
It works surprisingly well! I like using the “jobs to be done” framework to think about how to decompose really big tasks into smaller ones, wrap functions around them, and have a router essentially create a queue of commands to fire off these smaller, task-focused functions.