Chunked Audio Upload for Speech-to-Text Processing

Hello,

I have a feature request:
Chunked Audio Upload for Speech-to-Text Processing

Current Limitation:
The API currently requires the full audio file to be uploaded before processing, leading to increased latency.

Proposed Feature:
Add an API endpoint to support chunked audio uploads during recording, allowing processing to begin as audio is being sent.

Benefit:
Reduces upload latency, improving user experience.

This feature would significantly improve real-time processing capabilities and user satisfaction.

Please provide feedback on this feature request.

Thanks!

1 Like

Hi there,

Thank you for the product feedback! We’ve been exploring adding features like Streamed responses back from the Whisper API, as it’s transcribing large files. We don’t have an ETA on when it’ll be launched though.

Best,

Jan

this would be absolutely killer for realtime audio processing agents. If i could upload an audio stream and then begin downloading the output of whisper at the same time. You could build a voip agent that starts tool calling the second a specific keyword leaves a persons’ mouth. please make this!

Can’t say any more yet but we’re working on these and extremely excited!!