rf-detr-base

by roboflow

First real-time detection transformer to break 60 AP on COCO, built on DINOv2

420Kdl/month

29Mparams

HuggingFace Use in Pipeline

Identifiers

Model ID

roboflow/rf-detr-base

Feature URI

mixpeek://image_extractor@v1/roboflow_rf_detr_base_v1

Overview

RF-DETR is a real-time object detection architecture developed by Roboflow that combines a DINOv2 vision transformer backbone with deformable DETR decoding. It eliminates traditional detection components like anchor boxes and NMS, using neural architecture search to find optimal encoder-decoder configurations that balance speed and accuracy across model sizes from Nano (2.3ms) to 2XL (60.1 AP).

On Mixpeek, RF-DETR Base provides the best speed-accuracy tradeoff for real-time object detection pipelines, processing video frames at over 150 FPS on GPU while maintaining 53.3 AP on COCO. Its strong fine-tuning transfer makes it ideal for domain-specific detection tasks on both large and small custom datasets.

Architecture

DINOv2 ViT backbone with deformable attention decoder. 29M parameters. Uses bipartite matching loss for set prediction. Designed via neural architecture search to optimize latency-accuracy Pareto frontier. Supports TensorRT FP16 export for production deployment.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

await mx.collections.ingest({
  collection_id: "my-collection",
  source: { url: "https://example.com/video.mp4" },
  feature_extractors: [{
    name: "object_detection",
    version: "v1",
    params: {
      model_id: "roboflow/rf-detr-base"
    }
  }]
});

Capabilities

53.3 AP on COCO val2017 at base size
Real-time inference at ~6ms / image (T4 TensorRT FP16)
DINOv2 backbone enables strong domain transfer
NMS-free end-to-end detection pipeline
Scales from Nano (2.3ms) to 2XL (60.1 AP)

Use Cases on Mixpeek

Real-time video surveillance with high-throughput object detection across camera feeds

Quality inspection in manufacturing, detecting defects on production lines at frame rate

Retail shelf analytics, counting and classifying products with sub-10ms latency

Benchmarks

Dataset	Metric	Score	Source
COCO val2017	AP50:95	53.3	Roboflow, 2025 — RF-DETR Benchmarks
COCO val2017 (Large variant)	AP50:95	56.5	Roboflow, 2025 — RF-DETR Benchmarks
COCO val2017 (2XL variant)	AP50:95	60.1	Roboflow, 2025 — RF-DETR Benchmarks