Mixpeek Logo
    Login / Signup
    Models/Detection & Recognition/AILab-CVC/YOLO-World-L
    PyTorchObject DetectionGPL-3.0

    YOLO-World-L

    by AILab-CVC

    Real-time open-vocabulary object detection with text prompts

    320Kdl/month
    ~100Mparams
    Identifiers
    Model ID
    AILab-CVC/YOLO-World-L
    Feature URI
    mixpeek://image_extractor@v1/tencent_yoloworld_large_v1

    Overview

    YOLO-World extends the YOLO detector family with open-vocabulary detection via vision-language modeling. Users specify objects to detect with text prompts; the model finds them zero-shot at real-time speeds (52 FPS on V100).

    On Mixpeek, YOLO-World enables detecting arbitrary objects in video and images using natural language, without retraining for each new category.

    Architecture

    YOLO backbone with Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN). Uses region-text contrastive loss and a prompt-then-detect paradigm where vocabulary is embedded as model parameters for fast inference.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/video.mp4" },
      feature_extractors: [{
        name: "object_detection",
        version: "v1",
        params: { model_id: "AILab-CVC/YOLO-World-L" }
      }]
    });

    Capabilities

    • Open-vocabulary detection with text prompts
    • 52 FPS on V100 (real-time)
    • 35.4 AP on LVIS zero-shot
    • Supports image-prompted detection
    • ONNX and TFLite INT8 export

    Use Cases on Mixpeek

    Real-time video monitoring for arbitrary object types
    Content moderation with dynamically defined categories
    Retail inventory tracking with custom product lists
    Open-ended visual question answering pipelines

    Specification

    FrameworkPyTorch
    OrganizationAILab-CVC
    FeatureObject Detection
    Outputbbox + label
    Modalitiesvideo, image
    RetrieverObject Filter
    Parameters~100M
    LicenseGPL-3.0
    Downloads/mo320K

    Research Paper

    YOLO-World: Real-Time Open-Vocabulary Object Detection

    arxiv.org

    Build a pipeline with YOLO-World-L

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder