NEWVectors or files. Pick a path.Start →
    Models/Embeddings/perplexity-ai/pplx-embed-v1-late-0.6b
    HFText EmbeddingsMIT

    pplx-embed-v1-late-0.6b

    by perplexity-ai

    Late-interaction (ColBERT-style) embedding model from Perplexity AI

    4.9Kdl/month
    0.6Bparams
    Identifiers
    Model ID
    perplexity-ai/pplx-embed-v1-late-0.6b
    Feature URI
    mixpeek://text_extractor@v1/perplexity_pplx_embed_late_06b_v1

    Overview

    pplx-embed-v1-late is a 0.6B parameter late-interaction embedding model from Perplexity AI that uses ColBERT-style token-level representations with MaxSim scoring. Unlike dense single-vector embeddings, it produces 128-dimensional vectors for each token, enabling fine-grained matching that captures partial document relevance. It outperforms ColBERT-zero on BEIR (56.61 nDCG@10) and jina-colbert-v2 on MIRACL multilingual retrieval (66.62).

    Architecture

    Late-interaction architecture based on the pplx-embed-v1-0.6b backbone. Produces per-token 128-dimensional vectors instead of a single document vector. Scoring uses MaxSim — for each query token, find the maximum similarity to any document token, then sum across query tokens. This enables fine-grained partial matching that dense embeddings miss. Optimized CUDA and Metal kernels available for efficient scoring.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    // Managed: create a collection over a bucket; Mixpeek runs this model's extractor
    const collection = await mx.collections.create({
      namespace_id: "my-namespace",
      collection_name: "my-collection",
      source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
      feature_extractor: {
        feature_extractor_name: "text_embeddings",
        version: "v1",
        parameters: { model_id: "perplexity-ai/pplx-embed-v1-late-0.6b" },
      },
    });

    Capabilities

    • Fine-grained token-level document matching via MaxSim
    • Better partial relevance detection than dense embeddings
    • Multilingual retrieval (strong MIRACL performance)
    • Compatible with existing ColBERT indexing infrastructure
    • Optimized GPU/Metal kernels for production scoring

    Use Cases on Mixpeek

    High-precision document retrieval where partial matches matter
    Legal and medical document search requiring exact phrase matching
    Multilingual retrieval across diverse language pairs
    Two-stage retrieval — dense first stage, late-interaction reranking

    Benchmarks

    DatasetMetricScoreSource
    BEIRnDCG@1056.61Beats ColBERT-zero
    MIRACLnDCG@1066.62Beats jina-colbert-v2 on multilingual

    Performance

    Input SizeVariable
    GPU LatencyInput dependent
    GPU Throughput~500 documents/sec (A100, batch 64)
    GPU Memory~1.5 GB

    Specification

    FrameworkHF
    Organizationperplexity-ai
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters0.6B
    LicenseMIT
    Downloads/mo4.9K

    Build a pipeline with pplx-embed-v1-late-0.6b

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio