🦜 Parakeet v3 as Speech Recognition model | Voters

🦜 Parakeet v3 as Speech Recognition model

complete

Lucho

In addition to the current basic local Whisper model, it could be added Parakeet as ASR model option, which offers excellent performance with low-latency, especially for European languages, the primary user base of Hedy AI. Parakeet is highly efficient, requires only 1GB of storage and runs well on mid‑range devices manufactured in/since 2024, making it a valuable upgrade for improved local transcription quality, even offline.

Note:

An optimized speaker-identification workflow can be envision with Parakeet generated transcripts being sequentiallly processed by Sortformer, Nvidia's state of the art model for

speaker diarization

. Currently, Sortformer v2.1 identifies up to 4 speakers, neverthless an 8-speaker version, better suited for meetings, is in progress with an ETA in Q2 2026

December 13, 2025

Julian Pscheid

updated the status to

complete

Parakeet has been added previously. At this point we recommend migrating to Nemotron.

Julian Pscheid

Lucho:

Thanks! Sortformer is on our radar. It's actually openly available now (no early access needed), and the audio framework we build on already ships it, along with a newer streaming diarization model that handles up to 10 speakers. We're testing both as possible upgrades to speaker detection.

Lucho

Julian Pscheid The early access is for the upcoming version of Sortformer, supporting up to

8 speakers

. The full public release is scheduled for September 2026.

FerTech

We really need speaker diarization, I have to do transcriptions again from the audio to be able to do this, and not having it natively on Hedy is generating so much work. thank you!

Julian Pscheid

I love seeing some new ASR model options. We'll monitor this!

Lucho

Julian Pscheid After several months of using voice dictation on a frequent basis, Parakeet has proven to be very efficient and well-suited for real-time applications.

Looking ahead to innovative uses of Parakeet + Sortformer + LLM transcript optimization.

Julian Pscheid

Lucho That's great to hear. Do you mind me asking how you run it?

Lucho

Julian Pscheid I mainly use Parakeet multilingual as a local ASR model with voice‑dictation apps like SuperWhisper and Alter, which offer one‑click download and setup. It’s snappy and works well even for Live Captions on an 8GB‑RAM laptop. Speed and performance are particularly strong in English, French, Spanish, and German, where it outperforms Whisper‑Large‑v3‑Turbo in many real‑time, local‑device scenarios.
There is strong potential for Nvidia's Sortformer
 (multi-speaker supervision) combined with Parakeet or Nemotron ASR to improve live audio transcription/diarization during multi‑participant meetings. A recent real‑time example can be seen here: https://www.youtube.com/watch?v=AThOsk2qJbs
This article summarizes the main advantages of Parakeet: https://list.alterhq.com/p/the-future-of-voice-is-here-nvidia-parakeet-is-out-of-the-cage
A few limitations to keep in mind: currently Sortformer supports up to 4 speakers (but an 8‑speaker version is in progress, with a new target release in Q3 2026), and Parakeet supports 25 languages, mainly European. Broad smartphone adoption is expected during 2026, with available optimized versions for CoreML/NPU, and some apps already allow using Parakeet on iOS and Android at the time of writing.