Former lead of MongoDB's Search Team, Ethan noticed the most common problem customers faced was building indexing and search infrastructure on their S3 buckets. Mixpeek was born.
We tested Gemini, Twelve Labs Marengo, X-CLIP, SigLIP 2, and InternVideo2 on text-to-video retrieval with graded relevance. The results surprised us.
Google's Gemini Embedding 2 embeds images, PDFs, and text together in a single API call. Here's how we integrated it into Mixpeek's feature extractor pipeline, the production numbers, and where multi-file embedding beats single-chunk approaches.
How we built query preprocessing into Mixpeek's feature_search stage — decompose a 500MB video into chunks, embed in parallel, fuse results. Zero API surface change for callers.
Sports broadcasters cut 4-8 hour editing sessions to 15 minutes using AI video analysis. Learn how to build automated highlight detection, archive search, and performance analytics pipelines for any sport.
We run 20+ ML models in parallel across video, image, and document pipelines. Here's the Ray architecture behind it -- custom resource isolation, flexible actor pools, distributed Qdrant writes, and the lessons we learned the hard way.
6,000+ ZIP codes straddle congressional district lines. At ZIP+4 precision, federal, state, and local disclaimer requirements can all apply simultaneously. Here's how multimodal AI solves what static rules engines can't.
Classify text, images, and video into 700+ IAB Content Taxonomy categories using multimodal AI. Learn how it works under the hood and how to extend it for your contextual targeting needs.
Instead of polling with an LLM on a cron schedule, Retriever Alerts evaluate semantic conditions at ingestion time. Vector math instead of inference calls. Event-driven instead of scheduled. Three API calls to set up.
Nurses spend 40% of their time on documentation. MDS coordinators abstract charts for 3-4 hours per assessment. PDPM revenue goes uncaptured. The root cause is that clinical documentation is inherently multimodal — and most tools only handle text.
How semantic chunking improves RAG quality by splitting content at natural boundaries rather than fixed token counts. Covers text, documents, video, and audio.
How agentic retrieval goes beyond traditional RAG by letting AI agents dynamically plan and execute multi-step search strategies with tool calling.
How AI-powered video intelligence extracts structured, searchable information from raw footage — covering scene detection, transcription, face recognition, and temporal indexing.
A clear comparison of keyword, semantic, and hybrid search with practical guidance on when to use each approach in production systems.
A practical guide to building search that works across text, images, video, and audio using shared embedding spaces and retrieval pipelines.
How we built a fast, efficient, and production-ready vision-language model server without Python