Skip to main content
Pinecone is a vector database built for single-embedding KNN search. Mixpeek is a multimodal data warehouse that decomposes files into searchable features, stores them across cost tiers, and reassembles answers through multi-stage retrieval pipelines. This guide walks you through migrating your search workload from Pinecone to Mixpeek.

Why Migrate

Pinecone stores and queries individual vectors. Mixpeek processes raw files end-to-end: extracting features, storing documents across tiered storage, and executing multi-stage retrieval pipelines. You stop managing embeddings and start working with content.
PineconeMixpeek
You generate embeddings externallyFeature extractors generate embeddings automatically
Single-vector KNN per queryMulti-stage pipelines: search, filter, rerank, enrich
Flat storage pricingTiered storage: hot (active), warm (cold), archive, up to 90% savings
Metadata filtering on vector resultsAttribute filters, boolean logic, and cross-modal joins as pipeline stages
One index per embedding modelOne namespace handles multiple modalities and models simultaneously

Concept Mapping

PineconeMixpeekNotes
IndexNamespaceTop-level container for your data
NamespaceNamespace (via X-Namespace header)Tenant or environment isolation within a namespace
VectorDocument (with features)Documents contain extracted features, metadata, and lineage back to the source file
UpsertObject upload + Collection processingData flows through the pipeline: upload to bucket, collection triggers extraction
Query (top-k KNN)Retriever execution (multi-stage)Retrievers chain stages: semantic search, filters, reranking, enrichment
Metadata filterAttribute filter stageFilters are composable stages in a retrieval pipeline

Migration Steps

1

Create a Namespace

Set up a namespace to hold your data. This replaces your Pinecone index.
curl -X POST https://api.mixpeek.com/v1/namespaces \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace_name": "product-catalog"
  }'
2

Create a Collection with Feature Extractors

Define what features to extract from your data. This replaces the external embedding step you had with Pinecone.
curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: product-catalog" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "products",
    "feature_extractor": {
      "feature_extractor_name": "multimodal",
      "version": "v1"
    }
  }'
3

Re-ingest Your Data Through the Pipeline

Upload your source files to a bucket and let the collection process them. Do not try to import your existing Pinecone vectors directly. Mixpeek extracts richer, multi-modal features from your raw content.
Never insert vectors directly into the storage layer. All data must flow through the ingestion pipeline: bucket upload, collection trigger, feature extraction. This ensures proper lineage, validation, and multi-modal indexing.
# Upload objects to a bucket
curl -X POST https://api.mixpeek.com/v1/buckets/{bucket_id}/objects \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: product-catalog" \
  -H "Content-Type: application/json" \
  -d '{
    "key_prefix": "/products",
    "blobs": [
      { "property": "image", "url": "s3://your-bucket/product-001.jpg" },
      { "property": "description", "url": "s3://your-bucket/product-001.json" }
    ]
  }'
4

Create a Retriever with Multi-Stage Pipelines

Build a retriever that goes beyond single-vector KNN. Chain semantic search with filters, reranking, and enrichment.
curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: product-catalog" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "product-search",
    "stages": [
      {
        "type": "semantic_search",
        "config": {
          "query": "{{INPUT.query}}",
          "top_k": 50
        }
      },
      {
        "type": "attribute_filter",
        "config": {
          "filters": {
            "category": "{{INPUT.category}}"
          }
        }
      },
      {
        "type": "rerank",
        "config": {
          "top_k": 10
        }
      }
    ]
  }'
5

Test and Verify

Execute your retriever and compare results against your Pinecone baseline.
curl -X POST https://api.mixpeek.com/v1/retrievers/{retriever_id}/execute \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: product-catalog" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "query": "red running shoes",
      "category": "footwear"
    },
    "limit": 10
  }'

Side-by-Side Comparison

The Mixpeek retriever does in one API call what requires multiple steps with Pinecone: embedding generation, vector search, and post-processing.
import pinecone
from sentence_transformers import SentenceTransformer

# You manage the embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

# You generate the embedding
query_embedding = model.encode("red running shoes").tolist()

# Single-vector KNN search
pinecone.init(api_key="PINECONE_KEY", environment="us-east-1")
index = pinecone.Index("product-catalog")

results = index.query(
    vector=query_embedding,
    top_k=10,
    filter={"category": "footwear"},
    include_metadata=True
)

for match in results["matches"]:
    print(match["id"], match["score"], match["metadata"])

What You Gain

CapabilityPineconeMixpeek
Multi-stage retrievalSingle KNN query; post-processing is your problemChain search, filter, rerank, and enrich stages in one pipeline
Automatic feature extractionYou build and maintain embedding pipelinesFeature extractors handle it: CLIP, Whisper, LayoutLM, and more
Tiered storageAll vectors at one price tierHot, cold, and archive tiers, up to 90% savings on infrequently accessed data
Multimodal searchOne embedding model per indexSearch across text, images, video, and audio in the same namespace
No per-query vector feesPer-read pricing on every queryFlat API pricing, no per-vector-read charges
Complete lineageVectors disconnected from source filesTrace any result back through document, object, and source file

Next Steps

Quickstart

Get Mixpeek running in 10 minutes

Feature Extractors

Learn about automatic feature extraction

Retrievers

Build multi-stage retrieval pipelines

Core Concepts

Understand the data model