sam3
by facebook
Concept-level segmentation with open-vocabulary detection and video tracking
facebook/sam3mixpeek://image_extractor@v1/facebook_sam3_v1Overview
SAM 3 is Meta's unified foundation model for concept-level segmentation. It detects, segments, and tracks objects using open-vocabulary text prompts or visual exemplars, handling 270K+ unique concepts. It bridges the gap between detection and segmentation in a single model.
On Mixpeek, SAM 3 enables concept-driven content analysis — specify any concept in text and SAM 3 will find, segment, and track every instance across images and video.
Architecture
Decoupled detector-tracker architecture sharing a vision encoder. 848M total parameters. Uses a presence token for discriminating closely related prompts. Trained on 4M+ automatically annotated concepts.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
await mx.collections.ingest({
collection_id: "my-collection",
source: { url: "https://example.com/video.mp4" },
feature_extractors: [{
name: "segmentation",
version: "v1",
params: { model_id: "facebook/sam3" }
}]
});Capabilities
- Open-vocabulary detection + segmentation (270K+ concepts)
- Video tracking with mask propagation
- Text and visual exemplar prompts
- Concept-level exhaustive segmentation
- Outperforms OWLv2, DINO-X, Gemini 2.5 on benchmarks
Use Cases on Mixpeek
Specification
Research Paper
SAM 3: Segment Anything with Concepts
arxiv.orgBuild a pipeline with sam3
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder