Skip to main content
Weaviate is a vector database with built-in vectorization modules. Mixpeek is a multimodal data warehouse that goes further: decomposing files into layered features, storing them across cost tiers, and reassembling answers through multi-stage retrieval pipelines. This guide walks you through migrating your search workload from Weaviate to Mixpeek.

Why Migrate

Weaviate introduced multimodal capabilities with modules like multi2vec-clip and img2vec-neural. Mixpeek builds on this direction but takes a fundamentally different approach.
Where Weaviate adds vector search to a database, Mixpeek starts from the file itself. A single video becomes transcripts, visual embeddings, scene descriptions, and detected entities, each independently searchable, all stored across cost tiers, and reassembled through configurable pipelines.
WeaviateMixpeek
Vectorization modules bolt onto storageFeature extraction is the core of the pipeline
GraphQL queries with vector searchMulti-stage retrieval pipelines: search, filter, rerank, enrich
Single storage tierTiered storage: hot, cold, archive, up to 90% savings
Schema-per-class designNamespace-level organization with collections per processing pipeline
Module-based multimodal supportNative decomposition: one file becomes many searchable layers

Concept Mapping

WeaviateMixpeekNotes
ClassCollectionDefines processing pipeline and feature extraction for a data type
PropertyDocument fieldDocuments contain extracted features, metadata, and source lineage
Object (with vector)Document (with features)Documents hold multiple feature types, not just one vector
Module (e.g., text2vec-openai)Feature ExtractorBuilt-in extractors: multimodal, image, text, face-identity, document, and more
GraphQL queryRetriever executionRetrievers are multi-stage pipelines, not single queries
nearText / nearVectorSemantic search stageOne stage in a larger pipeline
where filterAttribute filter stageFilters compose as pipeline stages alongside search and ranking
Cross-referenceSemantic JOIN / TaxonomyConnect documents across collections using vector similarity

Migration Steps

1

Create a Namespace

Replace your Weaviate instance with a Mixpeek namespace.
curl -X POST https://api.mixpeek.com/v1/namespaces \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace_name": "media-library"
  }'
2

Map Classes to Collections

Each Weaviate class becomes a Mixpeek collection with a feature extractor. Instead of choosing a vectorization module, you choose an extractor that matches your content type.
# Weaviate class definition
client.schema.create_class({
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "properties": [
        {"name": "title", "dataType": ["text"]},
        {"name": "body", "dataType": ["text"]},
        {"name": "category", "dataType": ["text"]}
    ]
})
3

Re-ingest Your Data

Upload source files through the Mixpeek pipeline instead of importing Weaviate objects. The pipeline extracts richer features than a single vectorization module.
Do not export vectors from Weaviate and import them into Mixpeek. Re-ingest your source files so the pipeline can extract multi-layered features, build lineage, and index across modalities.
# Upload objects to a bucket for processing
client.objects.create(
    bucket_id="your-bucket-id",
    key_prefix="/articles",
    blobs=[
        {"property": "content", "url": "s3://your-bucket/article-001.pdf"}
    ],
    namespace="media-library"
)
4

Replace GraphQL Queries with Retrievers

Weaviate’s GraphQL queries map to Mixpeek retrievers. The difference: retrievers chain multiple stages together.
{
  Get {
    Article(
      nearText: { concepts: ["machine learning trends"] }
      where: {
        path: ["category"]
        operator: Equal
        valueText: "technology"
      }
      limit: 10
    ) {
      title
      body
      _additional { score }
    }
  }
}
5

Build Multi-Stage Pipelines

Go beyond what Weaviate’s query language supports. Chain semantic search with attribute filters, reranking, and enrichment in a single retriever.
curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: media-library" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "article-search",
    "stages": [
      {
        "type": "semantic_search",
        "config": {
          "query": "{{INPUT.query}}",
          "top_k": 50
        }
      },
      {
        "type": "attribute_filter",
        "config": {
          "filters": {
            "category": "{{INPUT.category}}"
          }
        }
      },
      {
        "type": "rerank",
        "config": {
          "top_k": 10
        }
      }
    ]
  }'
6

Test and Verify

Execute retrievers and validate results against your Weaviate baseline.
curl -X POST https://api.mixpeek.com/v1/retrievers/{retriever_id}/execute \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: media-library" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "query": "machine learning trends",
      "category": "technology"
    },
    "limit": 10
  }'

What You Gain

CapabilityWeaviateMixpeek
File decompositionVectorize one property at a timeDecompose a file into multiple searchable layers automatically
Multi-stage retrievalSingle query with optional filtersChain search, filter, rerank, and enrich stages in one pipeline
Tiered storageAll data at one storage tierHot, cold, and archive tiers, up to 90% savings
Cross-modal searchModules per class, limited cross-modalNative cross-modal: text query finds video moments, audio segments, image regions
No infrastructureSelf-hosted or managed clusterFully managed API, no clusters, no module configuration
Complete lineageObjects in a classTrace results back through document, object, and source file
If you are using Weaviate’s multi2vec-clip for image-text search, Mixpeek’s multimodal extractor handles the same use case and adds video, audio, and document support in the same namespace.

Next Steps

Quickstart

Get Mixpeek running in 10 minutes

Feature Extractors

Learn about automatic feature extraction

Retrievers

Build multi-stage retrieval pipelines

Core Concepts

Understand the data model