[Issue] Tool Calling Failures on Groq LLMs via Pydantic-AI (OpenAI baseline vs Groq)

kinyugo · September 25, 2025, 6:51pm

Hi everyone,

I’m the creator of madin (github: kinyugo/madin), a library for preparing documents for agentic RAG. Recently, I’ve been experimenting with running it on Groq-hosted LLMs. While the output quality itself is good, I’ve run into serious inconsistencies with tool calling.

Here’s what I’ve observed so far:

Without explicitly passing tool_choice: required, Groq models almost always fail to call tools.
Even when tool_choice: required is set, the models sometimes attempt to call non-existent tools.
By contrast, OpenAI’s models handle tool calls reliably with the same setup (madin uses pydantic-ai under the hood).

For reference, here are some example traces from Logfire:

openai:gpt-4.1 (baseline): Works well, with errors unrelated to tool calling.
groq:openai/gpt-oss-120b: Tool call fails but the model partially recovers. (Ran with tool_choice: required).

I’d really like to adopt Groq services long-term since their latency is excellent (and the GPT-OSS models perform well on other providers like Hugging Face). However, the tool calling experience is currently very unreliable, which makes it difficult to use in production.

Has anyone else encountered similar issues with Groq + tool calls in pydantic-ai? Any known workarounds?

kinyugo · September 25, 2025, 6:52pm

As a follow-up to my earlier post, I wanted to share one more trace and a reproducible notebook.

Here’s another example from Logfire:

groq:moonshot/kimi-k2-instruct-0905: Tool call fails completely.

To make this easier to verify, I’ve also prepared a notebook that reproduces the runs I described.

Curious if anyone else has tested this model specifically, or if there are known workarounds to improve tool call reliability on Groq.

filippop81 · October 14, 2025, 4:44pm

same problem. groq calls goes to :
openai.APIError: Tool choice is none, but model called a tool

yawnxyz · October 14, 2025, 5:01pm

I see your tool is called json but sometimes it’s useful to be really explicit with the function name — under the hood it’s gpt-oss trying to figure out what the tool names are. It’s not as powerful as something like sonnet-4-5 so you have to spoon feed it a lot more information

this for example works pretty well:

curl --request POST \
    --url https://api.groq.com/openai/v1/chat/completions \
    --header 'authorization: Bearer ID' \
    --header 'content-type: application/json' \
    --data '{
    "messages": [
        {
            "role": "user",
            "content": "Extract structured data from: '\''John Doe, age 30, lives in New York'\''"
        }
    ],
    "model": "openai/gpt-oss-120b",
    "temperature": 1,
    "max_completion_tokens": 8192,
    "top_p": 1,
    "stream": false,
    "stop": null,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_json_from_data",
                "description": "Extract and return structured JSON data from a short string of unstructured text",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "data": {
                            "type": "string",
                            "description": "The raw text data to parse and extract information from"
                        },
                        "schema": {
                            "type": "object",
                            "description": "The expected JSON schema structure to extract",
                            "properties": {
                                "type": {
                                    "type": "string",
                                    "enum": ["object"]
                                },
                                "properties": {
                                    "type": "object",
                                    "description": "Field definitions for the extracted data"
                                },
                                "required": {
                                    "type": "array",
                                    "items": {
                                        "type": "string"
                                    },
                                    "description": "List of required fields"
                                }
                            }
                        }
                    },
                    "required": [
                        "data",
                        "schema"
                    ]
                }
            }
        }
    ]
}'```

kinyugo · October 14, 2025, 5:18pm

@yawnxyz The tool definitions are handled by pydantic-ai so I don’t have the flexibility to change the definitions. I have tested it with other small models such as gpt-5-small and I haven’t observed any issues with tool call.

Tool calling fails with larger models like kimi-k2-instruct as well so I am not sure it’s about model size.

yawnxyz · October 14, 2025, 5:39pm

oh I see! I’ll experiment a bit with pydantic-ai and report back (I’m mostly in JS)

how often does this fail, %-wise?

(I usually call tools vanilla, via fetch / curl, and at least on oss-120b I get almost 100% success, so it might be some kind of mismatch of prompt/tool/model/naming and pydantic-ai)

kinyugo · October 14, 2025, 6:00pm

It seems that your services regressed because I never had issues with the old kimi-k2 and previous models.

Currently it fails about 95% of the time. The output is usually correct but the tool calls are not even after model retries. I have had to shift to other providers.

kinyugo · October 14, 2025, 6:02pm

The experience is quite consistent cross python libraries. I have tried instructor as well and I get the same tool calling issue.

yawnxyz · October 14, 2025, 6:05pm

oh interesting, I’ll bring this up to the Kimi team to fix. Thanks so much for flagging this!

kinyugo · October 14, 2025, 6:08pm

You are welcome. I am happy to help and offer as much feedback as I can.

Groq is an amazing platform. However, the tool calling failures make it pretty unusable for any production workload. I hope this gets fixed soon.

Harry · October 14, 2025, 6:33pm

Your repo is cool. We do lots of RAG Pipelines where I work.

Have you tried Docling - Docling for unstructured data extraction? Docling is open-source and runs local.

kinyugo · October 14, 2025, 7:04pm

@Harry Thanks for the kind sentiments. Yes, I do primarily use docling for document extraction.

My lib is supposed to take the markdown output of tools such as docling and ensure that they have a hierarchical structure that we can use for agentic rag similar to PageIndex I aim to build a more flexible and customizable pipeline.

Harry · October 14, 2025, 7:13pm

I was unfamiliar with PageIndex, but looks great!

sayanjdas · October 14, 2025, 8:33pm

I would also like to add that Kimi K2 0905 is awesome & fast, however we get rampant tool call issues.

The tool definitions are all very precise and exact, but about 80% of the time it fails to call the tool and just hallucinates an answer.

Eg: get_date_and_time tool, prompt: “what’s the year?“
result: model doesnt call tool, just responds with “2025”. same thing happens for “what’s the day today?” - model just hallucinates a wrong date.

We’ve also had issues with the gpt-oss models calling tools that don’t exist. I believe this to be a platform issue rather than the individual models themselves. I think this is a large blocker for teams like mine migrating to Groq from OpenAI.

yawnxyz · October 16, 2025, 12:38am

Thank you for the reports, we’re working on rolling out some tool call updates/patches very soon!

filippop81 · December 12, 2025, 8:40am

did you solve the issue?

yawnxyz · December 12, 2025, 7:37pm

We’ve been rolling out more updates/patches under the hood; would love to hear if it’s solved your issues as well

arjunram · December 12, 2025, 7:51pm

Wait what? tell us more. Would love to run evals if you have patched stuff. Any insights would be help? Especially which models?

yawnxyz · December 12, 2025, 8:01pm

oh we’re improving our harness. but constrained decoding is juuust around the corner, and the team is working hard to get it out the door

OmniFury · February 24, 2026, 4:34pm

Any updates? Would love to hear if it is being actively worked on or not.

Topic		Replies	Views
Tool calling errors on both gpt oss models Forum	12	1851	December 27, 2025
Gpt-oss:20b calling tools unasked Forum	12	481	December 1, 2025
Gpt-oss-120b ignoring tools Forum	49	2967	March 5, 2026
Parallel Tool Use with Groq API Tutorials	3	532	March 2, 2026
Allow no output parsing of message content (return raw response only) Feature Requests	6	174	December 1, 2025

[Issue] Tool Calling Failures on Groq LLMs via Pydantic-AI (OpenAI baseline vs Groq)

Related topics