distil-large-v3
by distil-whisper
6x faster speech recognition distilled from Whisper Large v3
distil-whisper/distil-large-v3mixpeek://transcription@v1/distilwhisper_large_v3Overview
Distil-Whisper Large v3 is a knowledge-distilled variant of OpenAI's Whisper Large v3 that achieves within 1% word error rate of the teacher model while running 6.3x faster. The distillation process copies the full encoder and selects a subset of maximally spaced decoder layers, reducing the parameter count by 51% without significant quality loss.
On Mixpeek, Distil-Whisper is the recommended transcription model for high-throughput pipelines where you need to process large audio and video libraries quickly while maintaining near-Whisper-level accuracy.
Architecture
Encoder-decoder Transformer. The encoder is entirely copied from Whisper Large v3 and frozen during training. The decoder uses a subset of the teacher's decoder layers, initialized from maximally spaced positions. Trained via knowledge distillation on pseudo-labeled audio data.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/video.mp4" },feature_extractors: [{name: "audio_transcription",version: "v1",params: {model_id: "distil-whisper/distil-large-v3"}}]});
Capabilities
- 6.3x faster than Whisper Large v3
- Within 1% WER of the teacher on long-form audio
- 51% fewer parameters than Whisper Large v3
- Word-level timestamps and language detection
- Robust to background noise and accents
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| LibriSpeech (test-clean) | WER | ~2.1% | Gandhi et al., 2023 — within 1% of Whisper Large v3 |
| OOD short-form (4 datasets) | Avg WER | Within 1.5% of teacher | Distil-Whisper model card |
| Long-form (sequential) | WER delta | < 1% vs Large v3 | Distil-Whisper model card |
Performance
756M params — 6.3x faster than Whisper Large v3 with near-identical accuracy
Specification
Research Paper
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
arxiv.orgBuild a pipeline with distil-large-v3
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio