NEWVectors or files. Pick a path.Start →
    Models/Embeddings/perplexity-ai/pplx-embed-v1-0.6b
    HFText EmbeddingsMIT

    pplx-embed-v1-0.6b

    by perplexity-ai

    Diffusion-pretrained 0.6B text embeddings with INT8 quantization — SOTA at sub-1B scale

    120Kdl/month
    0.6Bparams
    Identifiers
    Model ID
    perplexity-ai/pplx-embed-v1-0.6b
    Feature URI
    mixpeek://text_extractor@v1/perplexity_pplx_embed_v1_06b

    Overview

    pplx-embed-v1-0.6B is Perplexity AI's lightweight text embedding model built on diffusion continued pre-trained Qwen3 with bidirectional attention. It natively produces INT8-quantized embeddings, reducing storage requirements by 4x compared to FP32 while maintaining retrieval quality. At just 0.6B parameters, it achieves 68.6 nDCG@10 on MTEB Retrieval — beating the much larger Qwen3-Embed-0.6B (61.2) and BGE-M3 (62.3).

    The model supports 32K context length and 1024-dimensional embeddings, with optional binary quantization for 32x storage reduction. On Mixpeek, pplx-embed provides a fast, storage-efficient embedding backbone for text-heavy retrieval pipelines where index size and inference cost are primary constraints.

    Architecture

    Bidirectional attention transformer built on diffusion continued pre-trained Qwen3. 0.6B parameters. 32K context length. Natively produces INT8-quantized 1024-dimensional embeddings. Supports binary quantization for 32x storage reduction.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    // Managed: create a collection over a bucket; Mixpeek runs this model's extractor
    const collection = await mx.collections.create({
      namespace_id: "my-namespace",
      collection_name: "my-collection",
      source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
      feature_extractor: {
        feature_extractor_name: "text_embedding",
        version: "v1",
        parameters: { model_id: "perplexity-ai/pplx-embed-v1-0.6b" },
      },
    });

    Capabilities

    • 68.6 nDCG@10 on MTEB Retrieval — SOTA at sub-1B scale
    • Native INT8 quantization (4x storage reduction)
    • Optional binary embeddings (32x storage reduction)
    • 32K context window for long documents
    • Beats BGE-M3 and Qwen3-Embed-0.6B on retrieval benchmarks

    Use Cases on Mixpeek

    Storage-efficient text indexing: embed large document collections with minimal disk footprint
    High-throughput RAG pipelines: fast embedding generation for retrieval-augmented generation
    Cost-sensitive text search: strong retrieval quality at minimal compute and storage cost

    Benchmarks

    DatasetMetricScoreSource
    MTEB Retrieval (en)nDCG@1068.6Perplexity AI, 2026 — arxiv:2602.11151
    BERGEN End-to-End RAGAvg scoreBeats Qwen3-embedding-4B on 3/5 tasksPerplexity AI, 2026 — arxiv:2602.11151

    Performance

    Input SizeUp to 32K tokens
    Embedding Dim1024 (INT8 native)
    GPU Latency~5ms / passage (A100)
    GPU Throughput~2000 passages/sec (A100, batch 128)
    GPU Memory~1.2 GB

    Specification

    FrameworkHF
    Organizationperplexity-ai
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters0.6B
    LicenseMIT
    Downloads/mo120K

    Research Paper

    pplx-embed: State-of-the-Art Embedding Models for Web-Scale Retrieval

    arxiv.org

    Build a pipeline with pplx-embed-v1-0.6b

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio