NEWVectors or files. Pick a path.Start →
    Models/Embeddings/lightonai/GTE-ModernColBERT-v1
    HFText EmbeddingsApache-2.0

    GTE-ModernColBERT-v1

    by lightonai

    Late interaction retrieval model with record-breaking long-context performance

    119Kdl/month
    149Mparams
    Identifiers
    Model ID
    lightonai/GTE-ModernColBERT-v1
    Feature URI
    mixpeek://text_extractor@v1/lighton_gte_moderncolbert_v1

    Overview

    GTE-ModernColBERT-v1 is a ColBERT-style late interaction retrieval model built on the ModernBERT architecture. Instead of compressing an entire document into a single vector, it produces 128-dimensional embeddings for every token, then scores query-document pairs using MaxSim — for each query token, find the best-matching document token and sum the scores. This token-level matching preserves fine-grained detail that single-vector models lose.

    The model's standout capability is long-context retrieval. On the LongEmbed benchmark (documents up to 32K tokens), it scores 88.39 mean — roughly 10 points above the previous state of the art. It also outperforms ColBERT-small on BEIR while supporting documents up to 32K tokens natively. Trained in just 15K steps on MS MARCO using LightOn's PyLate library, it demonstrated that the ModernBERT + ColBERT recipe produces competitive results with minimal training compute.

    Architecture

    ModernBERT encoder (from Alibaba-NLP/gte-modernbert-base) with a linear projection layer (768 → 128 dimensions, no bias, no activation). Produces per-token 128-dim embeddings. Default query length 32 tokens, document length up to 32K tokens. Scoring via MaxSim operator. Trained with knowledge distillation on MS MARCO using PyLate.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    // Managed: create a collection over a bucket; Mixpeek runs this model's extractor
    const collection = await mx.collections.create({
      namespace_id: "my-namespace",
      collection_name: "my-collection",
      source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
      feature_extractor: {
        feature_extractor_name: "text_embedding",
        version: "v1",
        parameters: { model_id: "lightonai/GTE-ModernColBERT-v1" },
      },
    });

    Capabilities

    • Late interaction retrieval with per-token 128-dim embeddings
    • Long-context support up to 32K tokens (tested to 32,768)
    • 88.39 mean on LongEmbed benchmark (~10 points above prior SOTA)
    • 54.75 NDCG@10 on BEIR — outperforms ColBERT-small
    • Apache 2.0 license, reproducible training with PyLate

    Use Cases on Mixpeek

    Precision retrieval for entity-rich queries in Mixpeek multi-stage pipelines
    Long-document search where single-vector compression loses detail
    Second-stage rescoring after dense retrieval for factoid and exact-match queries

    Benchmarks

    DatasetMetricScoreSource
    BEIR (15 datasets)NDCG@1054.75LightOn, 2025 — Model Card
    LongEmbed (32K context)Mean Score88.39LightOn, 2025 — Blog Post
    NanoBEIRNDCG@1067.58LightOn, 2025 — Model Card

    Performance

    Input SizeUp to 32,768 tokens (default 300, extensible)
    Embedding Dim128 per token
    GPU Latency~12ms / document (A100, 300 tokens)
    GPU Throughput~800 documents/sec (A100, batch 64)
    GPU Memory~0.6 GB

    Specification

    FrameworkHF
    Organizationlightonai
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters149M
    LicenseApache-2.0
    Downloads/mo119K

    Research Paper

    LightOn Releases GTE-ModernColBERT, First SOTA Late-Interaction Model Trained on PyLate

    arxiv.org

    Build a pipeline with GTE-ModernColBERT-v1

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio