Skip to main content

Overview

Mixpeek and Snowflake serve complementary roles in the modern data stack. Mixpeek decomposes unstructured files (images, video, audio, PDFs) into structured features and searchable documents. Snowflake stores, governs, and analyzes structured data at scale. Together, they close the gap between raw multimodal content and business-ready analytics.

Mixpeek

Ingests unstructured files, extracts features (embeddings, transcripts, classifications, metadata), and powers multimodal retrieval.

Snowflake

Stores structured outputs, enforces governance, and drives dashboards, ML pipelines, and cross-functional analytics.

Architecture

                        Mixpeek                              Snowflake
               +-----------------------+            +------------------------+
               |                       |            |                        |
  Files -----> |  Buckets & Collections|            |   Structured Tables    |
  (images,     |                       |            |                        |
   video,      |  Decompose files into |  export    |  - classifications     |
   audio,      |  features:            | ---------> |  - extracted metadata  |
   PDFs)       |   - embeddings        |            |  - taxonomy labels     |
               |   - transcripts       |            |  - document payloads   |
               |   - classifications   |            |                        |
               |   - metadata          |  enrich    |  Dashboards, BI, ML    |
               |                       | <--------- |  (feed back into       |
               |  Retrieval & Search   |            |   Mixpeek retrievers)  |
               +-----------------------+            +------------------------+

Use Cases

Export taxonomy classifications to Snowflake tables

After Mixpeek classifies your content with taxonomies, export the labels into Snowflake for reporting and governance.

Feed extracted metadata into Snowflake dashboards

Mixpeek extracts rich metadata from every file it processes — transcripts, detected objects, face identities, brand logos, audio fingerprints. Load these structured outputs into Snowflake and build dashboards in Tableau, Sigma, or Snowsight.

Use Snowflake data to enrich Mixpeek retrievers

Pull structured attributes from Snowflake (pricing, inventory, customer segments) and attach them to Mixpeek documents via the sql-lookup or api-call retriever stages. This lets your multimodal search results carry business context.

Quick Start

Export Mixpeek document metadata to a Snowflake table using the Mixpeek Python SDK and the Snowflake Connector.
1

Install dependencies

pip install mixpeek snowflake-connector-python
2

List documents from Mixpeek

from mixpeek import Mixpeek

client = Mixpeek(api_key="your-api-key")

# List documents from a collection
documents = client.collections.documents.list(
    collection_id="your-collection-id",
    page_size=100
)
3

Write to Snowflake

import snowflake.connector
import json

conn = snowflake.connector.connect(
    user="YOUR_USER",
    password="YOUR_PASSWORD",
    account="YOUR_ACCOUNT",
    warehouse="YOUR_WAREHOUSE",
    database="MIXPEEK_DATA",
    schema="PUBLIC"
)

cursor = conn.cursor()

# Create table if it does not exist
cursor.execute("""
    CREATE TABLE IF NOT EXISTS mixpeek_documents (
        document_id VARCHAR,
        source_url VARCHAR,
        content_type VARCHAR,
        metadata VARIANT,
        created_at TIMESTAMP_NTZ
    )
""")

# Insert each document
for doc in documents:
    cursor.execute(
        """
        INSERT INTO mixpeek_documents
            (document_id, source_url, content_type, metadata, created_at)
        VALUES (%s, %s, %s, PARSE_JSON(%s), %s)
        """,
        (
            doc.get("document_id"),
            doc.get("source", {}).get("url"),
            doc.get("content_type"),
            json.dumps(doc.get("metadata", {})),
            doc.get("created_at"),
        )
    )

conn.commit()
cursor.close()
conn.close()
For production workloads, use Snowflake’s COPY INTO with staged files or Snowpipe for continuous loading instead of row-by-row inserts.

When to Use Each

CapabilityMixpeekSnowflake
Ingest unstructured files (video, images, audio, PDFs)YesNo
Extract features (embeddings, transcripts, classifications)YesNo
Multimodal semantic searchYesNo
Structured SQL analyticsNoYes
Data governance and access controlDocument-level ACLRole-based, column-level
Dashboard and BI integrationNoYes (Snowsight, Tableau, etc.)
ML feature storeEmbedding vectorsTabular features
Mixpeek handles everything before the data is structured. Snowflake handles everything after. Use both to get a complete pipeline from raw files to business insights.
  • Taxonomies — classify content and export labels
  • SQL Lookup Stage — query external databases from retriever pipelines
  • API Call Stage — call external APIs during retrieval
  • Webhooks — trigger Snowflake loads when Mixpeek processing completes