I'm testing my very structured prompt that outputs a JSON.
With Llama3.3-70B I have a success rate that nears 100%; with both Llama4 models I have a failure rate of 100%.
The error is tool_use_failed: the output is a valid JSON,but it's probably encapsulated in a sort of wrapper that raises error (I'm using ChatGroq of langchain with Pydantic for the schema).
What I don't understand is this enormous difference between the two models. Is there any way to disable tool use or to force a Llama3.3 output?
The issue likely stems from Llama 4 models automatically triggering tool use behavior in Langchain via ChatGroq, which interferes with your structured JSON output validation through Pydantic. Since Llama 3.3-70B works reliably, a quick solution is to explicitly disable tool use in the Langchain settings or prompt template by turning off tool_choice or using model_kwargs={"tool_choice": "none"} when initializing the model. Alternatively, enforce a raw output format in your prompt by instructing the model not to call any tools and to output plain JSON only.
Not sure about that, since I’ve been making my calls in langchan using .with_structured_output(schema) for a long time, giving it a Pydantic schema of my answers (which is very simple actually: it’s just
class Answers(BaseModel): answers: List[str]
I don’t understand if under the hood it’s just JSON mode or it actually leverages this new structured output from Groq. Any idea ?