About AI Voice Transcription
AI Voice Transcription runs OpenAI's Whisper speech-to-text model right in your browser via transformers.js. Drop an audio or video file, get a transcript with timestamps, and export it as plain text, SRT or WebVTT subtitles ready for YouTube, Premiere or any video editor. Fifteen languages are auto-detected. The model weights download once and are cached, so subsequent transcripts are fully offline. Voice memos, meeting recordings, podcasts and lecture notes — none of it ever leaves the device.
- No uploads
- Browser-only
- Works offline
- 100% free
How it works
- 1
Choose audio or video
MP3, WAV, M4A, OGG, MP4, MOV, WebM — the tool extracts the audio track in-browser. Up to about an hour works comfortably on most laptops.
- 2
Pick a language (or auto)
Leave it on auto-detect or pin a specific language for cleaner results on short clips with accents or background noise.
- 3
Transcribe and export
Whisper runs locally; you watch the words appear in real time. Download as .txt, .srt or .vtt — or copy straight to clipboard.
Related tools
Browse allPull the audio track out of any video as MP3, WAV, AAC, or OGG. Configurable bitrate. ffmpeg.wasm in your browser.
Trim and clip an audio file — drag the waveform handles.
Convert a video clip to an animated GIF with two-pass palette extraction. Custom FPS + height. ffmpeg.wasm.
Format, clean and JSON-escape your prompts — strip invisible characters and fix smart quotes.
Frequently asked questions
Are my files uploaded to a server?
No. Every tool on SnapToolz runs entirely inside your browser using JavaScript and WebAssembly. Your file is read locally, processed in memory, and the result is offered as a download. Nothing is sent to a server — there isn't one to send to.
How accurate is the transcript?
Whisper is one of the strongest open speech models — for clean English audio it typically lands around 5–10% word error rate. Accuracy drops with heavy accents, overlapping speakers, low-bitrate audio, music underneath, or specialist vocabulary (drug names, legal terms, company names). Always proof-read before publishing.
Is the output AI-generated, and can it hallucinate?
Yes. Whisper occasionally inserts plausible-sounding phrases during silence or noisy passages — a known limitation. Always check the transcript against the audio, especially for legal, medical or journalistic use. See our /disclaimer.
Which languages are supported?
Auto-detection covers the main 15 Whisper languages: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Polish, Turkish, Chinese, Japanese, Korean, Arabic and Hindi. The model can transcribe more, but UI selection is capped to the languages where the multilingual small model is most reliable.
Why is the first transcript slow?
First run downloads the model weights (a few hundred MB). After that they're cached in your browser and every subsequent transcript starts immediately, fully offline.
Does it work offline?
Yes. SnapToolz is a Progressive Web App. After your first visit, the app is cached on your device and every tool keeps working without an internet connection.
Is SnapToolz free?
Yes — every tool is 100% free with no sign-up, no watermark, no hidden tier. The whole platform is open source and we have no plan to gate features.