to be able cleanly transcribe audio using whisperer need to clean audio first using vocal separator. also it is nice to have fast audio / vocals separator.
so i wish there was a streaming version of whisper and demucs
Thanks for the feature request! We’re looking at other options for speech to text, but I’ve honestly not played with any demucs models or systems. Do you know of any really good whisper demucs implementations out there?