LLM emits function calls as inline markup in natural language response

We’re integrating the Groq LLM with a real-time voice agent pipeline (LiveKit Agents). We’ve observed that the model sometimes emits tool calls as inline XML-style markup embedded inside normal conversational text, for example:

… Let me check that for you. <function=get_service>{…}

This creates issues for streaming / TTS-first systems, because the surrounding text is spoken before the tool executes, and the tool call cannot be cleanly separated without custom parsing and suppression logic.

Could you clarify:

  • Whether emitting inline function markup inside natural language responses is expected behavior

  • If there is a supported way to force tool calls to be returned only via the structured tool call channel (no mixed text)

  • Or if there is a recommended prompt or parameter configuration to prevent mixed text + function markup output

This is specifically impacting real-time voice use cases where speech must be serialized correctly around tool execution.

Thanks for your help.

Hi Patrick, I’m going to reproduce this — what model did you use?

Hi

I am currently using llama-3.3-70b-versatile

oh interesting, could you please DM me a reproducible cURL?

Hi You will have to excuse me but I dont seem to be able to DM you. I do have a curl which reproduces the issue

You should have gotten a DM from me!