dense_vector), external embedding generation, and custom pipelines to handle multimodal content. Mixpeek unifies all of this: feature extraction, tiered storage, and multi-stage retrieval in a single API.
This guide walks you through migrating your search workload from Elasticsearch to Mixpeek.
Why Migrate
Elasticsearch started as a keyword search engine. Vector search, embedding generation, and multimodal processing are additions you configure and maintain yourself. Mixpeek was built from the ground up as a multimodal data warehouse where feature extraction, storage tiering, and multi-stage retrieval are native primitives, not plugins.
| Elasticsearch | Mixpeek |
|---|---|
| Keyword search (BM25) with bolted-on vector search | Native semantic, keyword, and hybrid search in one pipeline |
| You generate and manage embeddings externally | Feature extractors handle embedding generation automatically |
| Single storage tier (hot-warm-cold requires manual ILM) | Automatic tiered storage: active, cold, archive, up to 90% savings |
| Complex DSL for combining query types | Multi-stage retriever pipelines: chain stages declaratively |
| Ingest pipelines for basic transforms | Collections with ML-powered feature extractors (CLIP, Whisper, LayoutLM) |
Concept Mapping
| Elasticsearch | Mixpeek | Notes |
|---|---|---|
| Index | Namespace | Top-level container for your data |
| Document | Document | Mixpeek documents contain extracted features, metadata, and source lineage |
| Mapping / Schema | Collection + Feature Extractor | Collections define what features to extract; schema is derived automatically |
| DSL query | Retriever (with stages) | Stages are like query clauses, composable and ordered |
bool query (must/should/filter) | Multi-stage pipeline | Each clause becomes a stage: search, filter, boost, rerank |
| Aggregation | Reduce stages / Taxonomies | Group, classify, and summarize results |
| Ingest pipeline | Collection + Feature Extractors | Mixpeek pipelines extract ML features, not just field transforms |
| Analyzer (tokenizer + filters) | Feature Extractor configuration | Extractors handle tokenization, embedding, and structured extraction |
| ILM (Index Lifecycle Management) | Automatic storage tiering | Hot, cold, archive managed by the platform |
Migration Steps
Replace Index Mappings with Collections
Instead of defining field types and analyzers, create a collection with a feature extractor that matches your content.
Replace Ingest Pipelines with Feature Extraction
Elasticsearch ingest pipelines handle basic field transforms. Mixpeek collections run ML models on your content: generating embeddings, extracting entities, transcribing audio, and more.
Translate DSL Queries to Retriever Stages
Elasticsearch’s query DSL maps naturally to Mixpeek retriever stages. Each DSL clause becomes a stage in the pipeline.
Build Multi-Stage Retriever Pipelines
Define a retriever that chains stages together. This replaces complex DSL queries with a declarative pipeline.
What You Gain
| Capability | Elasticsearch | Mixpeek |
|---|---|---|
| Hybrid search | Manual BM25 + kNN score tuning | Semantic and keyword stages with automatic reranking |
| Feature extraction | External embedding generation, custom ingest pipelines | Built-in ML extractors: CLIP, Whisper, LayoutLM, and more |
| Multimodal | Text-native; images/video require custom plugins | Native support for video, audio, images, and documents in the same namespace |
| Storage tiering | ILM policies you configure and maintain | Automatic tiering: active, cold, archive, managed by the platform |
| No infrastructure | Clusters, shards, replicas, JVM tuning | Fully managed API, no cluster operations |
| Lineage | Documents disconnected from source files | Trace any result back through document, object, and source file |
| Multi-stage pipelines | Complex nested DSL | Declarative stage pipelines: search, filter, rerank, enrich |
Next Steps
Quickstart
Get Mixpeek running in 10 minutes
Feature Extractors
Learn about automatic feature extraction
Retrievers
Build multi-stage retrieval pipelines
Core Concepts
Understand the data model

