Add speech-to-speech model support

(Posted by Aly on the Groq Discord: https://discord.com/channels/1207099205563457597/1376241264101425273)


Hi Groq Team,

I'd like to request the addition of support for speech-to-speech (or voice-to-voice) models like ultravox on the Groq inference engine.

Thanks!

Just to clarify, Ultravox is a speech-to-text LLM model, not speech-to-speech. However, speech-to-text LLMs such as Ultravox and Microsoft Phi-4 Multimodal would still be helpful (at this point probably more so than speech to speech, not to mention that I am not aware of many open-source speech-to-speech models.)

Ah yes you’re right; we’re considering those, but yeah there’s no sts models as far as I’m aware of, but that would be so cool