parakeet-tdt-0.6b-v3

by nvidia

600M multilingual ASR with 25-language support and automatic language detection

420Kdl/month

600Mparams

HuggingFace Use in Pipeline

Identifiers

Model ID

nvidia/parakeet-tdt-0.6b-v3

Feature URI

mixpeek://transcription@v1/nvidia_parakeet_tdt_v3

Overview

Parakeet TDT 0.6B v3 is NVIDIA's multilingual speech-to-text model built on the FastConformer-TDT architecture and trained on over 670,000 hours of audio from NVIDIA's Granary dataset. It extends the English-only v2 to 25 European languages with automatic language detection, achieving a 6.34% average WER on the HuggingFace Open ASR Leaderboard while maintaining among the highest throughput of any multilingual model.

On Mixpeek, Parakeet TDT powers cost-efficient multilingual transcription pipelines where Whisper-class accuracy is needed at lower compute cost. Its 600M parameter count and FastConformer architecture deliver excellent throughput for batch processing large audio and video archives across European languages.

Architecture

FastConformer encoder with Token-and-Duration Transducer (TDT) decoder. 600M parameters. Uses a unified SentencePiece tokenizer with 8,192-token vocabulary. Supports audio up to 3 hours via local attention mode. Automatic language identification across 25 languages.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

await mx.collections.ingest({
  collection_id: "my-collection",
  source: { url: "https://example.com/interview.mp4" },
  feature_extractors: [{
    name: "transcription",
    version: "v1",
    params: {
      model_id: "nvidia/parakeet-tdt-0.6b-v3"
    }
  }]
});

Capabilities

25 European languages with automatic detection
1.93% WER on LibriSpeech test-clean
6.34% average WER on Open ASR Leaderboard
Audio up to 3 hours via local attention mode
Word-level timestamps included

Use Cases on Mixpeek

Multilingual video transcription for European content libraries at scale

Batch audio processing of podcasts and meetings across 25 languages

Cost-efficient ASR pipeline replacing Whisper for European language content

Benchmarks

Dataset	Metric	Score	Source
LibriSpeech test-clean	WER	1.93%	NVIDIA, 2025 — Model Card
LibriSpeech test-other	WER	3.59%	NVIDIA, 2025 — Model Card
Open ASR Leaderboard (avg)	WER	6.34%	NVIDIA, 2025 — Model Card