NEWVector Store Object Storage — 50x cheaper.Read the post →
    Models/Text Extraction/zai-org/GLM-OCR
    HFOCRMIT

    GLM-OCR

    by zai-org

    #1 document OCR at 0.9B — MIT licensed, edge-deployable

    520Kdl/month
    0.9Bparams
    Identifiers
    Model ID
    zai-org/GLM-OCR
    Feature URI
    mixpeek://image_extractor@v1/zai_glm_ocr_v1

    Overview

    GLM-OCR is a tiny (0.9B parameter) multimodal OCR model built on the GLM-V encoder-decoder architecture. Despite its small size, it ranks #1 on OmniDocBench V1.5 (94.62 overall score), outperforming models 10x its size on complex document understanding tasks including tables, formulas, handwriting, and multi-column layouts.

    Its MIT license and sub-1B parameter count make it ideal for edge deployment, serverless functions, and cost-sensitive pipelines. On Mixpeek, GLM-OCR powers document text extraction for PDFs, scanned images, and screenshots where high accuracy matters more than raw throughput.

    Architecture

    GLM-V encoder-decoder with vision encoder (ViT variant) and autoregressive text decoder. 0.9B total parameters. Processes document images at native resolution with adaptive tiling for multi-page documents.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/document.pdf" },
    feature_extractors: [{
    name: "ocr",
    version: "v1",
    params: {
    model_id: "zai-org/GLM-OCR"
    }
    }]
    });

    Capabilities

    • #1 on OmniDocBench V1.5 (94.62 overall)
    • Tables, formulas, handwriting, multi-column layout support
    • Only 0.9B parameters — runs on edge devices and serverless
    • MIT license for unrestricted commercial use

    Use Cases on Mixpeek

    Document digitization pipelines for scanned archives
    Edge OCR for mobile document capture and processing
    Cost-efficient text extraction at scale in serverless environments
    High-accuracy table and formula extraction from academic papers

    Benchmarks

    DatasetMetricScoreSource
    OmniDocBench V1.5 (overall)Score94.62ZAI, 2026 — Model Card

    Performance

    Input SizeVariable resolution (adaptive tiling)
    GPU Latency~530ms / page (A100)
    GPU Throughput~1.86 pages/sec (A100)
    GPU Memory~2.1 GB

    Specification

    FrameworkHF
    Organizationzai-org
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters0.9B
    LicenseMIT
    Downloads/mo520K

    Research Paper

    GLM-OCR: A Compact Multimodal OCR Model

    arxiv.org

    Build a pipeline with GLM-OCR

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio