dinov3-large
by facebook
Next-generation self-supervised vision model with Gram anchoring and 6.7B scaling
facebook/dinov3-largemixpeek://image_extractor@v1/facebook_dinov3_large_v1Overview
DINOv3 is Meta AI's successor to DINOv2, introducing Gram anchoring to solve dense feature degradation during long training schedules. It scales up to 6.7B parameters (ViT-7B) and trains on 1.7 billion web images plus 493M satellite images, making it the most versatile vision foundation model available.
On Mixpeek, DINOv3 delivers state-of-the-art visual features for tasks ranging from classification and segmentation to satellite/aerial imagery analysis, all without fine-tuning.
Architecture
Vision Transformer with patch size 16. Scales from ViT-S (21M) to ViT-7B (6.7B params). Introduces Gram anchoring to stabilize dense features during extended training. Also distills into ConvNeXt backbones. Supports flexible resolution and post-hoc text alignment.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
await mx.collections.ingest({
collection_id: "my-collection",
source: { url: "https://example.com/satellite.tiff" },
feature_extractors: [{
name: "image_embedding",
version: "v1",
params: { model_id: "facebook/dinov3-large" }
}]
});Capabilities
- Gram anchoring for stable dense feature training
- Scales up to 6.7B parameters (ViT-7B)
- Trained on 1.7B web + 493M satellite images
- ViT and ConvNeXt backbone variants
- Multi-domain: natural images and satellite/aerial imagery
Use Cases on Mixpeek
Specification
Research Paper
DINOv3
arxiv.orgBuild a pipeline with dinov3-large
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder