Mixpeek Logo
    Login / Signup
    Models/Embeddings/facebook/dinov3-large
    PyTorchVisual EmbeddingsApache 2.0

    dinov3-large

    by facebook

    Next-generation self-supervised vision model with Gram anchoring and 6.7B scaling

    450Kdl/month
    300M (Large), 6.7B (ViT-7B)params
    Identifiers
    Model ID
    facebook/dinov3-large
    Feature URI
    mixpeek://image_extractor@v1/facebook_dinov3_large_v1

    Overview

    DINOv3 is Meta AI's successor to DINOv2, introducing Gram anchoring to solve dense feature degradation during long training schedules. It scales up to 6.7B parameters (ViT-7B) and trains on 1.7 billion web images plus 493M satellite images, making it the most versatile vision foundation model available.

    On Mixpeek, DINOv3 delivers state-of-the-art visual features for tasks ranging from classification and segmentation to satellite/aerial imagery analysis, all without fine-tuning.

    Architecture

    Vision Transformer with patch size 16. Scales from ViT-S (21M) to ViT-7B (6.7B params). Introduces Gram anchoring to stabilize dense features during extended training. Also distills into ConvNeXt backbones. Supports flexible resolution and post-hoc text alignment.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/satellite.tiff" },
      feature_extractors: [{
        name: "image_embedding",
        version: "v1",
        params: { model_id: "facebook/dinov3-large" }
      }]
    });

    Capabilities

    • Gram anchoring for stable dense feature training
    • Scales up to 6.7B parameters (ViT-7B)
    • Trained on 1.7B web + 493M satellite images
    • ViT and ConvNeXt backbone variants
    • Multi-domain: natural images and satellite/aerial imagery

    Use Cases on Mixpeek

    High-fidelity visual search across massive image collections
    Satellite and aerial imagery analysis
    Dense segmentation and depth estimation
    Foundation for downstream classification without fine-tuning

    Specification

    FrameworkPyTorch
    Organizationfacebook
    FeatureVisual Embeddings
    Output768-dim vector
    Modalitiesvideo, image
    RetrieverVector Search
    Parameters300M (Large), 6.7B (ViT-7B)
    LicenseApache 2.0
    Downloads/mo450K

    Research Paper

    DINOv3

    arxiv.org

    Build a pipeline with dinov3-large

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder