marqo-fashionSigLIP
by Marqo
Fashion-domain visual embedding model fine-tuned with Generalised Contrastive Learning
Marqo/marqo-fashionSigLIPmixpeek://image_extractor@v1/marqo_fashionsiglip_v1Overview
Marqo FashionSigLIP is a ViT-B/16-SigLIP model fine-tuned on over 1M fashion products using Generalised Contrastive Learning (GCL). Unlike generic CLIP models, it trains on rich fashion metadata including categories, styles, colors, materials, and fine-grained product details, delivering up to 57% improvement in MRR and recall over previous fashion-specific models.
On Mixpeek, FashionSigLIP powers domain-specific visual search for e-commerce and retail, where generic embeddings miss style nuances like fabric texture, color palette, and silhouette that are critical for product discovery and recommendation.
Architecture
ViT-B/16-SigLIP (webli) backbone fine-tuned with Generalised Contrastive Learning on fashion-specific metadata (categories, styles, colors, materials, keywords). Sigmoid contrastive loss for efficient pairwise training. 768-dimensional shared image-text embedding space.
Mixpeek SDK Integration
from mixpeek import Mixpeekmx = Mixpeek(api_key="YOUR_KEY")mx.ingest(collection_id="fashion-catalog",source="s3://product-images/",extractors=[{"type": "visual_embedding","model": "Marqo/marqo-fashionSigLIP","output_feature": "fashion_embedding"}])
Capabilities
- Fashion-optimized 768-dim visual embeddings
- Text-to-image and image-to-image product search
- Fine-grained attribute awareness (color, material, style, silhouette)
- 57% MRR improvement over FashionCLIP 2.0
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| Fashion Product Retrieval | MRR improvement vs FashionCLIP 2.0 | +57% | Marqo, 2024 — marqo-FashionCLIP GitHub |
| Fashion Category Classification | Recall improvement vs FashionCLIP 2.0 | +57% | Marqo, 2024 — marqo-FashionCLIP GitHub |
Performance
Specification
Build a pipeline with marqo-fashionSigLIP
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio