NEWVector Store Object Storage — 50x cheaper.Read the post →
    Models/Captioning/Qwen/Qwen3.6-27B
    HFScene CaptioningApache 2.0

    Qwen3.6-27B

    by Qwen

    Dense 27B multimodal model with flagship-level coding and vision

    2.4Mdl/month
    27Bparams
    Identifiers
    Model ID
    Qwen/Qwen3.6-27B
    Feature URI
    mixpeek://image_extractor@v1/qwen36_27b_v1

    Overview

    Qwen3.6-27B is Alibaba's dense 27-billion-parameter multimodal model that supports vision-language thinking and non-thinking modes in a single unified checkpoint. Despite being a dense model, it surpasses the previous 397B MoE flagship (Qwen3.5-397B-A17B) on every major coding benchmark and delivers strong vision understanding.

    On Mixpeek, Qwen3.6-27B is the most powerful open-source captioning and visual reasoning model available, ideal for complex scene understanding, code extraction from screenshots, and detailed document analysis where accuracy matters more than throughput.

    Architecture

    64-layer dense language model using a hybrid layout of 16 repeats of (3x Gated DeltaNet + FFN, 1x Gated Attention + FFN) with hidden dim 5120 and FFN intermediate 17408. Supports 262K native context extensible to ~1M via YaRN. Trained with multi-token prediction.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/video.mp4" },
    feature_extractors: [{
    name: "scene_description",
    version: "v1",
    params: {
    model_id: "Qwen/Qwen3.6-27B"
    }
    }]
    });

    Capabilities

    • Vision-language thinking and non-thinking modes in one checkpoint
    • 262K native context window (extensible to ~1M tokens)
    • Flagship-level agentic coding (SWE-bench Verified: 77.2)
    • Strong visual understanding (MMMU: 82.9, VideoMME: 87.7)
    • Fits on a single consumer GPU with Q4_K_M quantization (16.8 GB)

    Use Cases on Mixpeek

    Complex visual scene analysis requiring deep reasoning across video content
    Code extraction and understanding from screenshots and technical documentation
    High-accuracy document analysis for legal, financial, and scientific content

    Benchmarks

    DatasetMetricScoreSource
    MMMUAccuracy82.9%Qwen3.6-27B blog post, April 2026
    SWE-bench VerifiedResolve Rate77.2%Qwen3.6-27B blog post, April 2026
    GPQA DiamondAccuracy87.8%Qwen3.6-27B blog post, April 2026

    Performance

    Input SizeText + variable resolution images/video
    GPU Latency~120ms / image (A100)
    GPU Throughput~8 images/sec (A100)
    GPU Memory~54 GB (bf16), ~16.8 GB (Q4_K_M)

    Specification

    FrameworkHF
    OrganizationQwen
    FeatureScene Captioning
    Outputtext
    Modalitiesvideo, image
    RetrieverSemantic Search
    Parameters27B
    LicenseApache 2.0
    Downloads/mo2.4M

    Research Paper

    Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

    arxiv.org

    Build a pipeline with Qwen3.6-27B

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio