GTE-ModernColBERT-v1
by lightonai
Late interaction retrieval model with record-breaking long-context performance
lightonai/GTE-ModernColBERT-v1mixpeek://text_extractor@v1/lighton_gte_moderncolbert_v1Overview
GTE-ModernColBERT-v1 is a ColBERT-style late interaction retrieval model built on the ModernBERT architecture. Instead of compressing an entire document into a single vector, it produces 128-dimensional embeddings for every token, then scores query-document pairs using MaxSim — for each query token, find the best-matching document token and sum the scores. This token-level matching preserves fine-grained detail that single-vector models lose.
The model's standout capability is long-context retrieval. On the LongEmbed benchmark (documents up to 32K tokens), it scores 88.39 mean — roughly 10 points above the previous state of the art. It also outperforms ColBERT-small on BEIR while supporting documents up to 32K tokens natively. Trained in just 15K steps on MS MARCO using LightOn's PyLate library, it demonstrated that the ModernBERT + ColBERT recipe produces competitive results with minimal training compute.
Architecture
ModernBERT encoder (from Alibaba-NLP/gte-modernbert-base) with a linear projection layer (768 → 128 dimensions, no bias, no activation). Produces per-token 128-dim embeddings. Default query length 32 tokens, document length up to 32K tokens. Scoring via MaxSim operator. Trained with knowledge distillation on MS MARCO using PyLate.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "text_embedding",
version: "v1",
parameters: { model_id: "lightonai/GTE-ModernColBERT-v1" },
},
});Capabilities
- Late interaction retrieval with per-token 128-dim embeddings
- Long-context support up to 32K tokens (tested to 32,768)
- 88.39 mean on LongEmbed benchmark (~10 points above prior SOTA)
- 54.75 NDCG@10 on BEIR — outperforms ColBERT-small
- Apache 2.0 license, reproducible training with PyLate
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| BEIR (15 datasets) | NDCG@10 | 54.75 | LightOn, 2025 — Model Card |
| LongEmbed (32K context) | Mean Score | 88.39 | LightOn, 2025 — Blog Post |
| NanoBEIR | NDCG@10 | 67.58 | LightOn, 2025 — Model Card |
Performance
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
LightOn Releases GTE-ModernColBERT, First SOTA Late-Interaction Model Trained on PyLate
arxiv.orgBuild a pipeline with GTE-ModernColBERT-v1
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio