sam-vit-huge
by facebook
Promptable foundation model for image segmentation
facebook/sam-vit-hugemixpeek://image_extractor@v1/facebook_sam_vit_huge_v1Overview
SAM (Segment Anything Model) is Meta's foundation model for image segmentation. Given prompts like points, boxes, or text, it produces high-quality object masks. Trained on SA-1B — the largest segmentation dataset with 1 billion masks on 11M images.
On Mixpeek, SAM powers pixel-level object segmentation for precise content understanding, enabling mask-based filtering and region-specific feature extraction.
Architecture
ViT-H image encoder (632M params) with a lightweight mask decoder. Produces 256x256 low-res masks refined to full resolution. Supports multiple prompt types: points, boxes, and masks.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
await mx.collections.ingest({
collection_id: "my-collection",
source: { url: "https://example.com/image.jpg" },
feature_extractors: [{
name: "segmentation",
version: "v1",
params: { model_id: "facebook/sam-vit-huge" }
}]
});Capabilities
- Promptable segmentation with points, boxes, or masks
- Automatic mask generation for everything in an image
- Zero-shot transfer competitive with supervised models
- Trained on 1 billion masks (SA-1B dataset)
Use Cases on Mixpeek
Specification
Research Paper
Segment Anything
arxiv.orgBuild a pipeline with sam-vit-huge
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder