GLM-OCR
by zai-org
#1 document OCR at 0.9B — MIT licensed, edge-deployable
zai-org/GLM-OCRmixpeek://image_extractor@v1/zai_glm_ocr_v1Overview
GLM-OCR is a tiny (0.9B parameter) multimodal OCR model built on the GLM-V encoder-decoder architecture. Despite its small size, it ranks #1 on OmniDocBench V1.5 (94.62 overall score), outperforming models 10x its size on complex document understanding tasks including tables, formulas, handwriting, and multi-column layouts.
Its MIT license and sub-1B parameter count make it ideal for edge deployment, serverless functions, and cost-sensitive pipelines. On Mixpeek, GLM-OCR powers document text extraction for PDFs, scanned images, and screenshots where high accuracy matters more than raw throughput.
Architecture
GLM-V encoder-decoder with vision encoder (ViT variant) and autoregressive text decoder. 0.9B total parameters. Processes document images at native resolution with adaptive tiling for multi-page documents.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/document.pdf" },feature_extractors: [{name: "ocr",version: "v1",params: {model_id: "zai-org/GLM-OCR"}}]});
Capabilities
- #1 on OmniDocBench V1.5 (94.62 overall)
- Tables, formulas, handwriting, multi-column layout support
- Only 0.9B parameters — runs on edge devices and serverless
- MIT license for unrestricted commercial use
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| OmniDocBench V1.5 (overall) | Score | 94.62 | ZAI, 2026 — Model Card |
Performance
Specification
Research Paper
GLM-OCR: A Compact Multimodal OCR Model
arxiv.orgBuild a pipeline with GLM-OCR
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio