Qwen3-ASR-1.7B

by Qwen

State-of-the-art open-source ASR for 52 languages with streaming and offline modes

320Kdl/month

1.7Bparams

HuggingFace Use in Pipeline

Identifiers

Model ID

Qwen/Qwen3-ASR-1.7B

Feature URI

mixpeek://transcription@v1/qwen3_asr_1b_v1

Overview

Qwen3-ASR-1.7B is Alibaba's flagship open-source speech recognition model supporting 52 languages and dialects. It combines a 300M-parameter AuT audio encoder with a Qwen3-1.7B decoder, achieving state-of-the-art performance among open-source ASR models and competing with the strongest proprietary APIs including OpenAI Whisper large v3.

On Mixpeek, Qwen3-ASR powers multilingual transcription pipelines that need broad language coverage beyond European languages. Its dual-mode architecture supports both streaming inference with 1-8 second chunks and offline processing of long recordings, making it versatile for real-time and batch workloads across 52 languages.

Architecture

AuT audio encoder (300M params, attention-encoder-decoder, 1024 hidden size) compresses audio 8x to 12.5 Hz representations. Qwen3-1.7B decoder for text generation. Dynamic flash attention window (1s-8s) enables both streaming and offline inference. Total 1.7B parameters.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

await mx.collections.ingest({
  collection_id: "my-collection",
  source: { url: "https://example.com/multilingual-video.mp4" },
  feature_extractors: [{
    name: "transcription",
    version: "v1",
    params: {
      model_id: "Qwen/Qwen3-ASR-1.7B"
    }
  }]
});