Pinecone is a vector database built for single-embedding KNN search. Mixpeek is a multimodal data warehouse that decomposes files into searchable features, stores them across cost tiers, and reassembles answers through multi-stage retrieval pipelines.
This guide walks you through migrating your search workload from Pinecone to Mixpeek.
Why Migrate
Pinecone stores and queries individual vectors. Mixpeek processes raw files end-to-end: extracting features, storing documents across tiered storage, and executing multi-stage retrieval pipelines. You stop managing embeddings and start working with content.
Pinecone Mixpeek You generate embeddings externally Feature extractors generate embeddings automatically Single-vector KNN per query Multi-stage pipelines: search, filter, rerank, enrich Flat storage pricing Tiered storage: hot (active), warm (cold), archive, up to 90% savings Metadata filtering on vector results Attribute filters, boolean logic, and cross-modal joins as pipeline stages One index per embedding model One namespace handles multiple modalities and models simultaneously
Concept Mapping
Pinecone Mixpeek Notes Index Namespace Top-level container for your data Namespace Namespace (via X-Namespace header) Tenant or environment isolation within a namespace Vector Document (with features) Documents contain extracted features, metadata, and lineage back to the source file Upsert Object upload + Collection processing Data flows through the pipeline: upload to bucket, collection triggers extraction Query (top-k KNN) Retriever execution (multi-stage) Retrievers chain stages: semantic search, filters, reranking, enrichment Metadata filter Attribute filter stage Filters are composable stages in a retrieval pipeline
Migration Steps
Create a Namespace
Set up a namespace to hold your data. This replaces your Pinecone index. curl -X POST https://api.mixpeek.com/v1/namespaces \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespace_name": "product-catalog"
}'
Create a Collection with Feature Extractors
Define what features to extract from your data. This replaces the external embedding step you had with Pinecone. curl -X POST https://api.mixpeek.com/v1/collections \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Namespace: product-catalog" \
-H "Content-Type: application/json" \
-d '{
"collection_name": "products",
"feature_extractor": {
"feature_extractor_name": "multimodal",
"version": "v1"
}
}'
Re-ingest Your Data Through the Pipeline
Upload your source files to a bucket and let the collection process them. Do not try to import your existing Pinecone vectors directly. Mixpeek extracts richer, multi-modal features from your raw content. Never insert vectors directly into the storage layer. All data must flow through the ingestion pipeline: bucket upload, collection trigger, feature extraction. This ensures proper lineage, validation, and multi-modal indexing.
# Upload objects to a bucket
curl -X POST https://api.mixpeek.com/v1/buckets/{bucket_id}/objects \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Namespace: product-catalog" \
-H "Content-Type: application/json" \
-d '{
"key_prefix": "/products",
"blobs": [
{ "property": "image", "url": "s3://your-bucket/product-001.jpg" },
{ "property": "description", "url": "s3://your-bucket/product-001.json" }
]
}'
Create a Retriever with Multi-Stage Pipelines
Build a retriever that goes beyond single-vector KNN. Chain semantic search with filters, reranking, and enrichment. curl -X POST https://api.mixpeek.com/v1/retrievers \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Namespace: product-catalog" \
-H "Content-Type: application/json" \
-d '{
"name": "product-search",
"stages": [
{
"type": "semantic_search",
"config": {
"query": "{{INPUT.query}}",
"top_k": 50
}
},
{
"type": "attribute_filter",
"config": {
"filters": {
"category": "{{INPUT.category}}"
}
}
},
{
"type": "rerank",
"config": {
"top_k": 10
}
}
]
}'
Test and Verify
Execute your retriever and compare results against your Pinecone baseline. curl -X POST https://api.mixpeek.com/v1/retrievers/{retriever_id}/execute \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Namespace: product-catalog" \
-H "Content-Type: application/json" \
-d '{
"inputs": {
"query": "red running shoes",
"category": "footwear"
},
"limit": 10
}'
Side-by-Side Comparison
The Mixpeek retriever does in one API call what requires multiple steps with Pinecone: embedding generation, vector search, and post-processing.
import pinecone
from sentence_transformers import SentenceTransformer
# You manage the embedding model
model = SentenceTransformer( "all-MiniLM-L6-v2" )
# You generate the embedding
query_embedding = model.encode( "red running shoes" ).tolist()
# Single-vector KNN search
pinecone.init( api_key = "PINECONE_KEY" , environment = "us-east-1" )
index = pinecone.Index( "product-catalog" )
results = index.query(
vector = query_embedding,
top_k = 10 ,
filter = { "category" : "footwear" },
include_metadata = True
)
for match in results[ "matches" ]:
print (match[ "id" ], match[ "score" ], match[ "metadata" ])
What You Gain
Capability Pinecone Mixpeek Multi-stage retrieval Single KNN query; post-processing is your problem Chain search, filter, rerank, and enrich stages in one pipeline Automatic feature extraction You build and maintain embedding pipelines Feature extractors handle it: CLIP, Whisper, LayoutLM, and more Tiered storage All vectors at one price tier Hot, cold, and archive tiers, up to 90% savings on infrequently accessed data Multimodal search One embedding model per index Search across text, images, video, and audio in the same namespace No per-query vector fees Per-read pricing on every query Flat API pricing, no per-vector-read charges Complete lineage Vectors disconnected from source files Trace any result back through document, object, and source file
Next Steps
Quickstart Get Mixpeek running in 10 minutes
Feature Extractors Learn about automatic feature extraction
Retrievers Build multi-stage retrieval pipelines
Core Concepts Understand the data model