Tool calling errors on both gpt oss models

I get a lot of tool calling errors on gpt models. Qwen and Kimi work fine….

'message': 'BadRequestError: litellm.BadRequestError: Error code: 400 - {\'error\': {\'message\': \'litellm.BadRequestError: GroqException - {"error":{"message":"Failed to parse tool call arguments as JSON","type":"invalid_request_error","code":"tool_use_failed","failed_generation":"import csv\\\\nmax_country=None\\\\nmax_val=-1\\\\nwith open(\\\'/workspace/project/wip/06895f81-cdd6-72fa-8000-1ca84933bf86/06895f82-51d8-7ce5-8000-f10f12198cc8/df.csv\\\', newline=\\\'\\\') as f:\\\\n reader=csv.DictReader(f)\\\\n for row in reader:\\\\n try:\\\\n val=float(row[\\\'6\\\'])\\\\n except Exception as e:\\\\n continue\\\\n if val\\\\u003emax_val:\\\\n max_val=val\\\\n max_country=row[\\\'0\\\']\\\\nprint(max_country, max_val)\\\\n\\\\n"}}\\n. Received Model Group=coding\\nAvailable Model Group Fallbacks=None\', \'type\': None, \'param\': None, \'code\': \'400\'}}'

,

Hi, thanks for this report! Could you provide an example request that’s failing? Would love to investigate this more for you.

Our team has been working hard to improve tool calling over the last few days. Can you try again and let me know if you’re still seeing the same issue?

Hi everyone

I’ve run into a recurring issue when using ChatGroq with LangChain tools. Specifically, when defining a tool that takes a Union of Pydantic models as input, the model often fails to generate the expected arguments in the correct schema.


Minimal Reproduction Example

from typing import Union
from langchain.tools import tool
from langchain_groq import ChatGroq
from pydantic import BaseModel
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv

load_dotenv()

# --- Define Pydantic models ---
class AddInput(BaseModel):
    """Defines two integers which needs to be added together."""
    a: int
    b: int

class ConcatInput(BaseModel):
    """Defines two strings which needs to be concatenated."""
    first: str
    second: str

@tool
def process_input(data: Union[AddInput, ConcatInput]) -> str:
    """Either add two integers or concatenate two strings."""
    if isinstance(data, AddInput):
        return f"Sum is: {data.a + data.b}"
    elif isinstance(data, ConcatInput):
        return f"Concatenated string: {data.first + data.second}"

prompt_template = PromptTemplate.from_template("{user_prompt}")
llm = ChatGroq(
    model="openai/gpt-oss-120b",
    reasoning_format="parsed",
)
model = llm.bind_tools([process_input])

test_agent = prompt_template | model

response = test_agent.invoke({"user_prompt": "Add 3 and 5 using process_input tool"})


Observed Error

BadRequestError: Error code: 400 - {
  'error': {
    'message': 'Tool call validation failed: tool call validation failed: parameters for tool process_input did not match schema: errors: [`/data`: expected object, but got array, `/data`: expected object, but got array]',
    'type': 'invalid_request_error',
    'code': 'tool_use_failed',
    'failed_generation': '{"name": "process_input", "arguments": {\n  "data": [3, 5]\n}}'
  }
}


What’s Happening

  • The tool schema expects data to be an object (e.g. {"a":3,"b":5}).

  • The model instead produces an array: [3,5].

  • This causes validation to fail and prevents the tool from being invoked correctly.


Why This Matters

This makes it very difficult to use Union-typed Pydantic models for tools, since the model cannot reliably emit the correct schema. I’ve tested with multiple prompts, but the generated arguments consistently break validation.


Questions

  1. Is this a known limitation with ChatGroq models and Union inputs?

  2. Are there recommended workarounds beyond splitting into multiple separate tools?

  3. Could schema guidance or better alignment help the model emit valid tool calls?


Would appreciate guidance or confirmation from the team/community on whether this is a bug, an alignment issue, or simply a modeling limitation.

Thanks!

Hi

Not sure if this is the same issue or if we got a new one. We can see that with groq random tool calls are generated in some cases. Is that related to caching?

We are working on the system and between releases needed to change tools for an agent.

But groq has simulated agent from the previous release.

Actions:

  1. Deploy AI MCP Agent with tools - [tool1, tool2, tool3].
  2. New release - deploy AI MCP Agent with [tool1, tool2]

So, in that case groq called tool3 despite it was no longer available.

Thanks!

Hi there – is it possible to give specific examples of the tools you’re referring to? I’m trying to understand if these tools are custom defined tools that you’re sending in, in which case it seems more likely that it could be somehow caching related, or if this tool is one of the tools gpt-oss was trained to use, in which case it’s likely a model capability issue.

We run a few iterations and in some cases we have seen fallback to tavily_search.

However, the example given was done with the following tools: [list, search, execute]

We have removed list (for internal reasons), but it still calls “list”..

I’m able to reproduce the error on gpt-oss-120b, but switching to moonshotai/kimi-k2-instruct-0905 causes my test harness to pass; could you confirm this is the case for you as well? It seems like oss-120b isn’t able to process the union properly?

For the removed tool calling issue, I’m having trouble reproducing it; could you try giving them more explicit / project specific names like my_project_list / my_project_search? I’d be curious to see if it’s calling just list or your more specific my_project_list function name.

Hi

Just tried this with a simple setup for gpt-oss-120b and could not reproduce it. Not sure why.

We will try to deploy a more complex case on dev and see if it works there. It will be a bit more obvious if it breaks.

There is another item - sometimes the model fails to generates json (as in structured output). This is no related, but just wanted to mention.

Thanks for raising the issue!

The models can get unfortunately temperamental sometimes with tool calling and JSON generation — this is something addressing in the backend right now w/ more guardrails.

Hi

We have the same problem.
The API documentation states that we can disable tool validation:

  • disable_tool_validation boolean Optional Defaults to false

    If set to true, groq will return called tools without validating that the tool is present in request.tools. tool_choice=required/none will still be enforced, but the request cannot require a specific tool be used.

However, this does not include disabling tool call schema validation.

Is there a way to bypass schema validation and let our program handle invalid schemas ?

MRE

Thanks, I’m investigating this — I’m able to reproduce

1 Like