We’re integrating the Groq LLM with a real-time voice agent pipeline (LiveKit Agents). We’ve observed that the model sometimes emits tool calls as inline XML-style markup embedded inside normal conversational text, for example:
… Let me check that for you. <function=get_service>{…}
This creates issues for streaming / TTS-first systems, because the surrounding text is spoken before the tool executes, and the tool call cannot be cleanly separated without custom parsing and suppression logic.
Could you clarify:
-
Whether emitting inline function markup inside natural language responses is expected behavior
-
If there is a supported way to force tool calls to be returned only via the structured tool call channel (no mixed text)
-
Or if there is a recommended prompt or parameter configuration to prevent mixed text + function markup output
This is specifically impacting real-time voice use cases where speech must be serialized correctly around tool execution.
Thanks for your help.