Mixpeek Logo
    Login / Signup
    Models/Embeddings/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
    HFVisual EmbeddingsMIT

    CLIP-ViT-bigG-14-laion2B-39B-b160k

    by laion

    Open-source CLIP trained on 2B image-text pairs at giant scale

    890Kdl/month
    1.8Bparams
    Identifiers
    Model ID
    laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
    Feature URI
    mixpeek://image_extractor@v1/laion_openclip_bigG_v1

    Overview

    OpenCLIP is the open-source reproduction of CLIP by the LAION/ML Foundations community. This ViT-bigG/14 variant was trained on LAION-2B (2 billion image-text pairs), achieving up to 85.4% ImageNet zero-shot accuracy — surpassing OpenAI's original CLIP.

    On Mixpeek, OpenCLIP provides the highest-accuracy open-weight visual embeddings for text-to-image and image-to-image retrieval at scale.

    Architecture

    Vision Transformer (ViT-bigG/14) with ~1.8B vision parameters. Trained with contrastive learning on LAION-2B dataset for 39B samples seen. Produces 1280-dim embeddings projected to shared vision-text space.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/product.jpg" },
      feature_extractors: [{
        name: "image_embedding",
        version: "v1",
        params: { model_id: "laion/CLIP-ViT-bigG-14-laion2B-39B-b160k" }
      }]
    });

    Capabilities

    • 85.4% ImageNet zero-shot accuracy
    • Trained on 2B open image-text pairs
    • 1280-dimensional dense embeddings
    • Strongest open-weight CLIP variant
    • Supports both ViT and ConvNeXt backbones

    Use Cases on Mixpeek

    Large-scale visual search with maximum accuracy
    Cross-modal retrieval across image and text
    E-commerce product similarity and discovery
    Foundation model for downstream vision-language tasks

    Specification

    FrameworkHF
    Organizationlaion
    FeatureVisual Embeddings
    Output768-dim vector
    Modalitiesvideo, image
    RetrieverVector Search
    Parameters1.8B
    LicenseMIT
    Downloads/mo890K

    Research Paper

    Reproducible Scaling Laws for Contrastive Language-Image Learning

    arxiv.org

    Build a pipeline with CLIP-ViT-bigG-14-laion2B-39B-b160k

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder