NEWVector Store Object Storage — 50x cheaper.Read the post →
    storage
    Mux logo

    Mux

    Turn every Mux video into a searchable, intelligent asset

    Sync videos from Mux into Mixpeek for automatic multimodal extraction — scene understanding, object detection, face identity, OCR, and transcription. Build visual search retrievers that let users find the exact frame, scene, or spoken word across your entire video library.

    Measurable impact from day one

    What teams see after connecting Mux to Mixpeek

    95%

    Less manual review

    Teams find the exact frame in seconds instead of scrubbing through hours of footage

    0

    Manual indexing steps

    Every Mux upload is decomposed and searchable within minutes, no human intervention

    40–60%

    Lower processing costs

    Selective sync filters ensure only relevant assets are indexed, eliminating waste

    <200ms

    Search latency

    Visual, face, transcript, and OCR queries across 10,000+ hours of video

    100%

    Audit coverage

    Every extraction step is logged from Mux ingest to search index, audit-ready out of the box

    <4 hrs

    Time to first query

    Connect Mux, configure filters, and run your first search query in under 4 hours

    Finding a scene in your video library
    Before
    Scrub through footage manually
    Rely on hand-entered tags
    Hours per search request
    No face or speech search
    After
    Type a natural language query
    Auto-extracted visual + audio features
    Results in <200ms
    Face, transcript, OCR in one query

    The Problem

    Video platforms store thousands of hours of content, but the footage itself is a black box. Finding a specific scene, verifying talent rights across a library, or searching for on-screen text means scrubbing through videos manually. Metadata is limited to what was entered at upload time — titles and tags that go stale fast. Teams waste hours on manual review that should take seconds.

    The Solution

    Mixpeek connects directly to Mux via selective sync. When a video lands in Mux, Mixpeek automatically decomposes it into frames and audio segments, then runs multimodal extractors — visual embeddings, object detection, face recognition, OCR, and speech transcription. Every extracted feature is indexed into a retriever so your team can search across scenes, objects, spoken words, and on-screen text from a single query.

    Pipeline Architecture

    Hover over each step to see how the components connect

    1

    Mux Selective Sync

    Webhook + Filters

    Videos uploaded to Mux trigger a webhook. Selective sync filters decide which assets flow into Mixpeek based on metadata, passthrough flags, or asset tags.

    2

    Asset Ingest

    Mixpeek Namespace

    Filtered Mux assets are pulled into a Mixpeek namespace. RAW video formats are converted via custom plugins (RED R3D, ARRI RAW) before processing.

    3

    Multimodal Decomposition

    Extractors

    Each video is decomposed into frames and audio segments. Extractors run in parallel: visual embeddings, object detection, face identity, OCR, and speech transcription.

    4

    Feature Indexing

    Collections

    Extracted features are stored in Mixpeek collections with full lineage back to the source Mux asset, timestamp, and frame number.

    5

    Visual Search Retriever

    Feature Search + Filters

    A retriever combines vector similarity, face identity matching, metadata filters, and full-text search across transcripts and OCR output.

    6

    Audit Trail

    Batch Processing

    Every pipeline step is logged — from Mux webhook receipt through extraction completion — providing full observability and compliance lineage.

    Mux Integration Deep Dive

    Selective sync lets you control exactly which Mux assets flow into Mixpeek using metadata filters and passthrough flags. When a video is uploaded to Mux with the right metadata, a webhook fires and Mixpeek pulls the asset automatically. RAW formats (RED R3D, ARRI RAW) are converted via custom plugins before extraction. The pipeline decomposes each video into scene compositions, detected objects, recognized faces, on-screen text, and transcribed speech — then indexes everything into a visual search retriever with feature search, face identity, and full-text stages. An audit trail tracks every step from ingest to searchable index.

    Solution

    Creative DNA

    Decompose every ad into hook, pacing, composition, and performance — then match new briefs against your entire library across all four axes.

    Explore Creative DNA
    video
    object-storage
    streaming
    search
    media
    selective-sync

    Ready to integrate?

    Get started with Mixpeek + Mux in minutes. Read the docs, create a free account, or schedule a walkthrough with our team.