gliner2-base-v1

by fastino

Unified NER, classification, and structured extraction in a single 205M CPU-efficient model

379Kdl/month

205Mparams

HuggingFace Run on your data, free

Identifiers

Model ID

fastino/gliner2-base-v1

Feature URI

mixpeek://document_extractor@v1/fastino_gliner2_base_v1

Overview

GLiNER 2 unifies named entity recognition, text classification, and hierarchical structured data extraction into a single 205M-parameter model built on a pretrained transformer encoder. Unlike pipeline approaches that chain separate models or LLM-based extraction that requires GPU infrastructure, GLiNER 2 runs efficiently on CPU with an intuitive schema-based interface that accepts natural language type descriptions.

On Mixpeek, GLiNER 2 powers lightweight entity extraction pipelines that run alongside heavier models without competing for GPU resources. Its zero-shot generalization across domains (matching GPT-4o on CrossNER benchmarks) makes it ideal for extracting custom entities from transcripts, OCR output, and document text without fine-tuning.

Architecture

Pretrained transformer encoder with multi-task composition heads for NER, classification, and structured extraction. 205M parameters. Schema-driven interface supporting natural language entity type descriptions, nested and overlapping spans, and configurable single or multi-label classification.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "entity_extraction",
    version: "v1",
    parameters: { model_id: "fastino/gliner2-base-v1" },
  },
});