NEWVector Store Object Storage — 50x cheaper.Read the post →
    Models/Detection & Recognition/roboflow/rf-detr-base
    HFObject DetectionApache 2.0

    rf-detr-base

    by roboflow

    First real-time detection transformer to break 60 AP on COCO, built on DINOv2

    420Kdl/month
    29Mparams
    Identifiers
    Model ID
    roboflow/rf-detr-base
    Feature URI
    mixpeek://image_extractor@v1/roboflow_rf_detr_base_v1

    Overview

    RF-DETR is a real-time object detection architecture developed by Roboflow that combines a DINOv2 vision transformer backbone with deformable DETR decoding. It eliminates traditional detection components like anchor boxes and NMS, using neural architecture search to find optimal encoder-decoder configurations that balance speed and accuracy across model sizes from Nano (2.3ms) to 2XL (60.1 AP).

    On Mixpeek, RF-DETR Base provides the best speed-accuracy tradeoff for real-time object detection pipelines, processing video frames at over 150 FPS on GPU while maintaining 53.3 AP on COCO. Its strong fine-tuning transfer makes it ideal for domain-specific detection tasks on both large and small custom datasets.

    Architecture

    DINOv2 ViT backbone with deformable attention decoder. 29M parameters. Uses bipartite matching loss for set prediction. Designed via neural architecture search to optimize latency-accuracy Pareto frontier. Supports TensorRT FP16 export for production deployment.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/video.mp4" },
    feature_extractors: [{
    name: "object_detection",
    version: "v1",
    params: {
    model_id: "roboflow/rf-detr-base"
    }
    }]
    });

    Capabilities

    • 53.3 AP on COCO val2017 at base size
    • Real-time inference at ~6ms / image (T4 TensorRT FP16)
    • DINOv2 backbone enables strong domain transfer
    • NMS-free end-to-end detection pipeline
    • Scales from Nano (2.3ms) to 2XL (60.1 AP)

    Use Cases on Mixpeek

    Real-time video surveillance with high-throughput object detection across camera feeds
    Quality inspection in manufacturing, detecting defects on production lines at frame rate
    Retail shelf analytics, counting and classifying products with sub-10ms latency

    Benchmarks

    DatasetMetricScoreSource
    COCO val2017AP50:9553.3Roboflow, 2025 — RF-DETR Benchmarks
    COCO val2017 (Large variant)AP50:9556.5Roboflow, 2025 — RF-DETR Benchmarks
    COCO val2017 (2XL variant)AP50:9560.1Roboflow, 2025 — RF-DETR Benchmarks

    Performance

    Input Size560×560 px (default)
    GPU Latency~6ms / image (T4 TensorRT FP16)
    GPU Throughput~165 images/sec (T4)
    GPU Memory~1.2 GB

    Specification

    FrameworkHF
    Organizationroboflow
    FeatureObject Detection
    Outputbbox + label
    Modalitiesvideo, image
    RetrieverObject Filter
    Parameters29M
    LicenseApache 2.0
    Downloads/mo420K

    Research Paper

    RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

    arxiv.org

    Build a pipeline with rf-detr-base

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio