Skip to main content

Hello,

I have a transcription service using whisper to transcribe audios, and I’m really happy with the service so far, however, sometimes the API will transcribe for over 60s (over which the request is stopped), but retrying completes the job in about 5s, see attached logs. All times are in GMT+2.

Affected request ids:

  • req_01jynty635fhwrw808qwwcge08
  • req_01jynty5f9e1vsfryhbthev5bz
  • req_01jynty2y4e1nrz7yyk67pwmde
  • req_01jynty2x0fhra7njk896aztwt

Some requests a few days ago:

  • req_01jxmetn46fkgr342pnv2f9v9k
  • req_01jxmbmg4weg0ant4cs9q3w56j (this one had a high TTFT)

 

 

As you can see, all audios are <20mins in length. They are all compressed in 16kbps opus format, so they are at most 33.5MBs in size. I haven’t been able to reproduce this error with my own audios so unfortunately I can’t share any, but do let me know if you need some metadata

Thank you for reporting this, I’ll take a look and get back to you


Hi, is there an update?


I’m having trouble reproducing this; are you still running into these errors, and do you know if it’s triggered by a specific codec format / file size / language?


Up until July 4th yes, thereafter we have reduced the timeout to 20s, which may be a bit aggressive but it has only slightly increased the occurrence of errors.

The audios are all transformed by the following command:

/opt/ffmpeg-layer/bin/ffmpeg -ss ${start} -to ${end} -i /tmp/input \
-vn \
-map_metadata -1 \
-ac 1 \
-c:a libopus \
-b:a 240k \
-application voip \
-compression_level 0 \
-threads 0 \
-y \
/tmp/output.ogg

We divide up each audio in chunks of 20 minutes (+15s of leeway) and then upload each of them separately to Groq. As you can see the format is always the same, opus @ 240kHz bitrate giving a consistent . The final bit that is less than 20 minutes has not had any problems. I will try lowering the length to about 15 minutes and the quality to 128kbps and see if that helps.

 

If it may help in investigating, the audios we get are usually recorded with a phone and from far away from the speaker


Reply