Nvidia/diar_streaming_sortformer_4spk-v2 support please

Please support nvidia/diar_streaming_sortformer_4spk-v2. It’s the best speaker diarization model I’ve used, way better than pyannote.

Please consider about supporting it. If not, at least add speaker diarization to the current Whisper model :slight_smile: