sam2.1-hiera-large
by facebook
Unified promptable segmentation for images and video with streaming memory
facebook/sam2.1-hiera-largemixpeek://image_extractor@v1/facebook_sam2_large_v1Overview
SAM 2 extends SAM to video with a streaming memory architecture for real-time processing. It's 6x faster than SAM on images with better accuracy, and the first foundation model that segments and tracks objects across video frames with prompts.
On Mixpeek, SAM 2 enables video-native segmentation — track objects across frames, segment specific items at any point in a video, and extract per-object features over time.
Architecture
Hiera image encoder with streaming memory for temporal context. SAM 2.1 Large: 224.4M params, 39.5 FPS on A100. Memory attention modules propagate masks across frames without re-computing the full image encoder.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
await mx.collections.ingest({
collection_id: "my-collection",
source: { url: "https://example.com/video.mp4" },
feature_extractors: [{
name: "segmentation",
version: "v1",
params: { model_id: "facebook/sam2.1-hiera-large" }
}]
});Capabilities
- Video object segmentation and tracking
- 6x faster than SAM on images
- Streaming memory architecture for real-time video
- Multi-object tracking with mask propagation
- Image segmentation with improved accuracy
Use Cases on Mixpeek
Specification
Research Paper
SAM 2: Segment Anything in Images and Videos
arxiv.orgBuild a pipeline with sam2.1-hiera-large
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder