gliner2-base-v1
by fastino
Unified NER, classification, and structured extraction in a single 205M CPU-efficient model
fastino/gliner2-base-v1mixpeek://document_extractor@v1/fastino_gliner2_base_v1Overview
GLiNER 2 unifies named entity recognition, text classification, and hierarchical structured data extraction into a single 205M-parameter model built on a pretrained transformer encoder. Unlike pipeline approaches that chain separate models or LLM-based extraction that requires GPU infrastructure, GLiNER 2 runs efficiently on CPU with an intuitive schema-based interface that accepts natural language type descriptions.
On Mixpeek, GLiNER 2 powers lightweight entity extraction pipelines that run alongside heavier models without competing for GPU resources. Its zero-shot generalization across domains (matching GPT-4o on CrossNER benchmarks) makes it ideal for extracting custom entities from transcripts, OCR output, and document text without fine-tuning.
Architecture
Pretrained transformer encoder with multi-task composition heads for NER, classification, and structured extraction. 205M parameters. Schema-driven interface supporting natural language entity type descriptions, nested and overlapping spans, and configurable single or multi-label classification.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/document.pdf" },feature_extractors: [{name: "entity_extraction",version: "v1",params: {model_id: "fastino/gliner2-base-v1",entity_types: ["person", "organization", "product", "date"]}}]});
Capabilities
- Zero-shot NER matching GPT-4o on CrossNER (F1: 0.590 vs 0.599)
- Named entity recognition with natural language type descriptions
- Text classification with single or multi-label output
- Hierarchical structured data extraction
- CPU-efficient inference, no GPU required
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| CrossNER (zero-shot, 5 domains) | F1 | 0.590 | GLiNER2, Jul 2025 — arXiv 2507.18546 |
| CrossNER AI domain | F1 | 0.547 | GLiNER2, Jul 2025 — arXiv 2507.18546 |
Performance
Specification
Research Paper
GLiNER2: An Efficient Multi-Task Information Extraction System
arxiv.orgBuild a pipeline with gliner2-base-v1
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio