Migrate from Weaviate

Weaviate is a vector database with built-in vectorization modules. Mixpeek is a multimodal data warehouse that goes further: decomposing files into layered features, storing them across cost tiers, and reassembling answers through multi-stage retrieval pipelines. This guide walks you through migrating your search workload from Weaviate to Mixpeek.

Why Migrate

Weaviate introduced multimodal capabilities with modules like multi2vec-clip and img2vec-neural. Mixpeek builds on this direction but takes a fundamentally different approach.

Where Weaviate adds vector search to a database, Mixpeek starts from the file itself. A single video becomes transcripts, visual embeddings, scene descriptions, and detected entities, each independently searchable, all stored across cost tiers, and reassembled through configurable pipelines.

Weaviate	Mixpeek
Vectorization modules bolt onto storage	Feature extraction is the core of the pipeline
GraphQL queries with vector search	Multi-stage retrieval pipelines: search, filter, rerank, enrich
Single storage tier	Tiered storage: hot, cold, archive, up to 90% savings
Schema-per-class design	Namespace-level organization with collections per processing pipeline
Module-based multimodal support	Native decomposition: one file becomes many searchable layers

Concept Mapping

Weaviate	Mixpeek	Notes
Class	Collection	Defines processing pipeline and feature extraction for a data type
Property	Document field	Documents contain extracted features, metadata, and source lineage
Object (with vector)	Document (with features)	Documents hold multiple feature types, not just one vector
Module (e.g., `text2vec-openai`)	Feature Extractor	Built-in extractors: multimodal, image, text, face-identity, document, and more
GraphQL query	Retriever execution	Retrievers are multi-stage pipelines, not single queries
`nearText` / `nearVector`	Semantic search stage	One stage in a larger pipeline
`where` filter	Attribute filter stage	Filters compose as pipeline stages alongside search and ranking
Cross-reference	Semantic JOIN / Taxonomy	Connect documents across collections using vector similarity

Migration Steps

Create a Namespace

Replace your Weaviate instance with a Mixpeek namespace.

curl -X POST https://api.mixpeek.com/v1/namespaces \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace_name": "media-library"
  }'

Map Classes to Collections

Each Weaviate class becomes a Mixpeek collection with a feature extractor. Instead of choosing a vectorization module, you choose an extractor that matches your content type.

# Weaviate class definition
client.schema.create_class({
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "properties": [
        {"name": "title", "dataType": ["text"]},
        {"name": "body", "dataType": ["text"]},
        {"name": "category", "dataType": ["text"]}
    ]
})

Re-ingest Your Data

Upload source files through the Mixpeek pipeline instead of importing Weaviate objects. The pipeline extracts richer features than a single vectorization module.

Do not export vectors from Weaviate and import them into Mixpeek. Re-ingest your source files so the pipeline can extract multi-layered features, build lineage, and index across modalities.

# Upload objects to a bucket for processing
client.objects.create(
    bucket_id="your-bucket-id",
    key_prefix="/articles",
    blobs=[
        {"property": "content", "url": "s3://your-bucket/article-001.pdf"}
    ],
    namespace="media-library"
)

Replace GraphQL Queries with Retrievers

Weaviate’s GraphQL queries map to Mixpeek retrievers. The difference: retrievers chain multiple stages together.

{
  Get {
    Article(
      nearText: { concepts: ["machine learning trends"] }
      where: {
        path: ["category"]
        operator: Equal
        valueText: "technology"
      }
      limit: 10
    ) {
      title
      body
      _additional { score }
    }
  }
}

Build Multi-Stage Pipelines

Go beyond what Weaviate’s query language supports. Chain semantic search with attribute filters, reranking, and enrichment in a single retriever.

curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: media-library" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "article-search",
    "stages": [
      {
        "type": "semantic_search",
        "config": {
          "query": "{{INPUT.query}}",
          "top_k": 50
        }
      },
      {
        "type": "attribute_filter",
        "config": {
          "filters": {
            "category": "{{INPUT.category}}"
          }
        }
      },
      {
        "type": "rerank",
        "config": {
          "top_k": 10
        }
      }
    ]
  }'

Test and Verify

Execute retrievers and validate results against your Weaviate baseline.

curl -X POST https://api.mixpeek.com/v1/retrievers/{retriever_id}/execute \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: media-library" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "query": "machine learning trends",
      "category": "technology"
    },
    "limit": 10
  }'

What You Gain

Capability	Weaviate	Mixpeek
File decomposition	Vectorize one property at a time	Decompose a file into multiple searchable layers automatically
Multi-stage retrieval	Single query with optional filters	Chain search, filter, rerank, and enrich stages in one pipeline
Tiered storage	All data at one storage tier	Hot, cold, and archive tiers, up to 90% savings
Cross-modal search	Modules per class, limited cross-modal	Native cross-modal: text query finds video moments, audio segments, image regions
No infrastructure	Self-hosted or managed cluster	Fully managed API, no clusters, no module configuration
Complete lineage	Objects in a class	Trace results back through document, object, and source file

If you are using Weaviate’s multi2vec-clip for image-text search, Mixpeek’s multimodal extractor handles the same use case and adds video, audio, and document support in the same namespace.

Next Steps

Quickstart

Get Mixpeek running in 10 minutes

Feature Extractors

Learn about automatic feature extraction

Retrievers

Build multi-stage retrieval pipelines

Core Concepts

Understand the data model

Getting Started

Ingest Data

Process Data

Search & Retrieve

Relevance & Personalization

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

Migrate from Weaviate

Why Migrate

Concept Mapping

Migration Steps

What You Gain

Next Steps

Quickstart

Feature Extractors

Retrievers

Core Concepts

Getting Started

Ingest Data

Process Data

Search & Retrieve

Relevance & Personalization

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

​Why Migrate

​Concept Mapping

​Migration Steps

​What You Gain

​Next Steps

Quickstart

Feature Extractors

Retrievers

Core Concepts

Why Migrate

Concept Mapping

Migration Steps

What You Gain

Next Steps