Question 1

What languages are supported?

Accepted Answer

Whisper supports 99 languages including English, Spanish, French, German, Dutch, Chinese, Japanese, Arabic, and many more. Auto-detection works well for most languages.

Question 2

Can I transcribe video files?

Accepted Answer

Yes! Upload MP4, MKV, WebM, or AVI files. We extract the audio track and transcribe it.

Question 3

What's the difference between TXT and SRT?

Accepted Answer

TXT gives you plain text — just the words. SRT gives you timestamped subtitles that you can import into video editors, YouTube, or media players.

Question 4

How long does transcription take?

Accepted Answer

Roughly 1 minute of processing per 5 minutes of audio. A 30-minute podcast takes about 6 minutes. The first transcription may be slower as the model loads.

Question 5

How accurate is it?

Accepted Answer

Whisper is one of the most accurate speech recognition models available. It handles accents, background noise, and multiple speakers well. Accuracy is highest for clear speech in major languages.

Question 6

Is my file safe?

Accepted Answer

Your file is uploaded to our EU server over HTTPS, processed, and automatically deleted within 1 hour. We never listen to, share, or store your audio.

Speech to Text

How to use Speech to Text

FAQ

Related tools