Tool_use_failed on Llama4 models

Hi,

I'm testing my very structured prompt that outputs a JSON.

With Llama3.3-70B I have a success rate that nears 100%; with both Llama4 models I have a failure rate of 100%.

The error is tool_use_failed: the output is a valid JSON,but it's probably encapsulated in a sort of wrapper that raises error (I'm using ChatGroq of langchain with Pydantic for the schema).

What I don't understand is this enormous difference between the two models. Is there any way to disable tool use or to force a Llama3.3 output?

Thanks

The issue likely stems from Llama 4 models automatically triggering tool use behavior in Langchain via ChatGroq, which interferes with your structured JSON output validation through Pydantic. Since Llama 3.3-70B works reliably, a quick solution is to explicitly disable tool use in the Langchain settings or prompt template by turning off tool_choice or using model_kwargs={"tool_choice": "none"} when initializing the model. Alternatively, enforce a raw output format in your prompt by instructing the model not to call any tools and to output plain JSON only.

Hi ​@Igoor, is this using the new Structured Outputs we just released? If not, could you try using this: https://console.groq.com/docs/structured-outputs

Hi, thanks for the answer.

Not sure about that, since I’ve been making my calls in langchan using .with_structured_output(schema) for a long time, giving it a Pydantic schema of my answers (which is very simple actually: it’s just

class Answers(BaseModel):
answers: List[str]


I don’t understand if under the hood it’s just JSON mode or it actually leverages this new structured output from Groq. Any idea ?

Ohh I think this uses the old JSON mode, and probably doesn’t use the new constrained outputs yet. Will make a note with engineering team

Thanks ! Would love to try the constrained outputs as soon as they are available via the langchain library !