🦜 Parakeet v3 as Speech Recognition model
L
Lucho
In addition to the current basic local Whisper model, it could be added Parakeet as ASR model option, which offers excellent performance with low-latency, especially for European languages, the primary user base of Hedy AI. Parakeet is highly efficient, requires only 1GB of storage and runs well on mid‑range devices manufactured in/since 2024, making it a valuable upgrade for improved local transcription quality, even offline.
Note:
An optimized speaker-identification workflow can be envision with Parakeet generated transcripts being sequentiallly processed by Sortformer, Nvidia's state of the art model for speaker diarization
. Currently, Sortformer v2.1 identifies up to 4 speakers, neverthless an 8-speaker version, better suited for meetings, is in progress with an ETA in Q2 2026F
FerTech
We really need speaker diarization, I have to do transcriptions again from the audio to be able to do this, and not having it natively on Hedy is generating so much work. thank you!
Julian Pscheid
I love seeing some new ASR model options. We'll monitor this!
L
Lucho
Julian Pscheid After several months of using voice dictation on a frequent basis, Parakeet has proven to be very efficient and well-suited for real-time applications.
Looking ahead to innovative uses of Parakeet + Sortformer + LLM transcript optimization.
Julian Pscheid
Lucho That's great to hear. Do you mind me asking how you run it?
L
Lucho
Julian Pscheid Mainly, I use Parakeet multilingual as local ASR model with voice dictation apps, like SuperWhisper and Alter (one-click download and setup). It's snappy and works great, even for Live Captions using a 8GB RAM laptop. Speed and performance shine brightest in English, French, Spanish, and German compared to Whisper Large v3 Turbo.
No doubt that Parakeet + Sortformer's speaker supervision will improve live audio transcription for multi-participant meetings. Recent real-time example: https://www.youtube.com/watch?v=AThOsk2qJbs
This article summarizes the main advantages of Parakeet: https://list.alterhq.com/p/the-future-of-voice-is-here-nvidia-parakeet-is-out-of-the-cage
Limitations Noted: Currently, Sortformer supports up to 4 speakers (but an 8-speaker version is in progress) and Parakeet works for 25 languages (mainly European). Broader smartphone support is expected during 2026.