NEWVector Store Object Storage — 50x cheaper.Read the post →
    Models/Embeddings/Marqo/marqo-fashionSigLIP
    HFVisual EmbeddingsMIT

    marqo-fashionSigLIP

    by Marqo

    Fashion-domain visual embedding model fine-tuned with Generalised Contrastive Learning

    965Kdl/month
    86Mparams
    Identifiers
    Model ID
    Marqo/marqo-fashionSigLIP
    Feature URI
    mixpeek://image_extractor@v1/marqo_fashionsiglip_v1

    Overview

    Marqo FashionSigLIP is a ViT-B/16-SigLIP model fine-tuned on over 1M fashion products using Generalised Contrastive Learning (GCL). Unlike generic CLIP models, it trains on rich fashion metadata including categories, styles, colors, materials, and fine-grained product details, delivering up to 57% improvement in MRR and recall over previous fashion-specific models.

    On Mixpeek, FashionSigLIP powers domain-specific visual search for e-commerce and retail, where generic embeddings miss style nuances like fabric texture, color palette, and silhouette that are critical for product discovery and recommendation.

    Architecture

    ViT-B/16-SigLIP (webli) backbone fine-tuned with Generalised Contrastive Learning on fashion-specific metadata (categories, styles, colors, materials, keywords). Sigmoid contrastive loss for efficient pairwise training. 768-dimensional shared image-text embedding space.

    Mixpeek SDK Integration

    from mixpeek import Mixpeek
    mx = Mixpeek(api_key="YOUR_KEY")
    mx.ingest(
    collection_id="fashion-catalog",
    source="s3://product-images/",
    extractors=[{
    "type": "visual_embedding",
    "model": "Marqo/marqo-fashionSigLIP",
    "output_feature": "fashion_embedding"
    }]
    )

    Capabilities

    • Fashion-optimized 768-dim visual embeddings
    • Text-to-image and image-to-image product search
    • Fine-grained attribute awareness (color, material, style, silhouette)
    • 57% MRR improvement over FashionCLIP 2.0

    Use Cases on Mixpeek

    Visual product search for fashion e-commerce (find similar garments by style)
    Automated product categorization and attribute tagging for catalogs
    Recommendation engines that understand fashion-specific visual similarity

    Benchmarks

    DatasetMetricScoreSource
    Fashion Product RetrievalMRR improvement vs FashionCLIP 2.0+57%Marqo, 2024 — marqo-FashionCLIP GitHub
    Fashion Category ClassificationRecall improvement vs FashionCLIP 2.0+57%Marqo, 2024 — marqo-FashionCLIP GitHub

    Performance

    Input Size224x224 px
    Embedding Dim768
    GPU Latency~6ms / image (A100)
    CPU Latency~70ms / image
    GPU Throughput~165 images/sec (A100)
    GPU Memory~1.1 GB

    Specification

    FrameworkHF
    OrganizationMarqo
    FeatureVisual Embeddings
    Output768-dim vector
    Modalitiesvideo, image
    RetrieverVector Search
    Parameters86M
    LicenseMIT
    Downloads/mo965K

    Build a pipeline with marqo-fashionSigLIP

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio