Migrate from Pinecone

Pinecone is a vector database built for single-embedding KNN search. Mixpeek is a multimodal data warehouse that decomposes files into searchable features, stores them across cost tiers, and reassembles answers through multi-stage retrieval pipelines. This guide walks you through migrating your search workload from Pinecone to Mixpeek.

Why Migrate

Pinecone stores and queries individual vectors. Mixpeek processes raw files end-to-end: extracting features, storing documents across tiered storage, and executing multi-stage retrieval pipelines. You stop managing embeddings and start working with content.

Pinecone	Mixpeek
You generate embeddings externally	Feature extractors generate embeddings automatically
Single-vector KNN per query	Multi-stage pipelines: search, filter, rerank, enrich
Flat storage pricing	Tiered storage: hot (active), warm (cold), archive, up to 90% savings
Metadata filtering on vector results	Attribute filters, boolean logic, and cross-modal joins as pipeline stages
One index per embedding model	One namespace handles multiple modalities and models simultaneously

Concept Mapping

Pinecone	Mixpeek	Notes
Index	Namespace	Top-level container for your data
Namespace	Namespace (via `X-Namespace` header)	Tenant or environment isolation within a namespace
Vector	Document (with features)	Documents contain extracted features, metadata, and lineage back to the source file
Upsert	Object upload + Collection processing	Data flows through the pipeline: upload to bucket, collection triggers extraction
Query (top-k KNN)	Retriever execution (multi-stage)	Retrievers chain stages: semantic search, filters, reranking, enrichment
Metadata filter	Attribute filter stage	Filters are composable stages in a retrieval pipeline

Migration Steps

Create a Namespace

Set up a namespace to hold your data. This replaces your Pinecone index.

curl -X POST https://api.mixpeek.com/v1/namespaces \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace_name": "product-catalog"
  }'

Create a Collection with Feature Extractors

Define what features to extract from your data. This replaces the external embedding step you had with Pinecone.

curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: product-catalog" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "products",
    "feature_extractor": {
      "feature_extractor_name": "multimodal",
      "version": "v1"
    }
  }'

Re-ingest Your Data Through the Pipeline

Upload your source files to a bucket and let the collection process them. Do not try to import your existing Pinecone vectors directly. Mixpeek extracts richer, multi-modal features from your raw content.

Never insert vectors directly into the storage layer. All data must flow through the ingestion pipeline: bucket upload, collection trigger, feature extraction. This ensures proper lineage, validation, and multi-modal indexing.

# Upload objects to a bucket
curl -X POST https://api.mixpeek.com/v1/buckets/{bucket_id}/objects \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: product-catalog" \
  -H "Content-Type: application/json" \
  -d '{
    "key_prefix": "/products",
    "blobs": [
      { "property": "image", "data": "s3://your-bucket/product-001.jpg" },
      { "property": "description", "data": "s3://your-bucket/product-001.json" }
    ]
  }'

Create a Retriever with Multi-Stage Pipelines

Build a retriever that goes beyond single-vector KNN. Chain semantic search with filters, reranking, and enrichment.

curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: product-catalog" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_name": "product-search",
    "stages": [
      {
        "stage_name": "search",
        "stage_type": "filter",
        "config": {
          "stage_id": "feature_search",
          "parameters": {
            "final_top_k": 50,
            "searches": [
              {
                "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
                "query": { "input_mode": "text", "value": "{{INPUT.query}}" },
                "top_k": 50
              }
            ]
          }
        }
      },
      {
        "stage_name": "filter_category",
        "stage_type": "filter",
        "config": {
          "stage_id": "attribute_filter",
          "parameters": {
            "field": "category",
            "operator": "eq",
            "value": "{{INPUT.category}}"
          }
        }
      },
      {
        "stage_name": "rerank",
        "stage_type": "sort",
        "config": {
          "stage_id": "rerank",
          "parameters": {
            "top_k": 10
          }
        }
      }
    ]
  }'

Test and Verify

Execute your retriever and compare results against your Pinecone baseline.

curl -X POST https://api.mixpeek.com/v1/retrievers/{retriever_id}/execute \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Namespace: product-catalog" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "query": "red running shoes",
      "category": "footwear"
    },
    "limit": 10
  }'

Side-by-Side Comparison

The Mixpeek retriever does in one API call what requires multiple steps with Pinecone: embedding generation, vector search, and post-processing.

import pinecone
from sentence_transformers import SentenceTransformer

# You manage the embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

# You generate the embedding
query_embedding = model.encode("red running shoes").tolist()

# Single-vector KNN search
pinecone.init(api_key="PINECONE_KEY", environment="us-east-1")
index = pinecone.Index("product-catalog")

results = index.query(
    vector=query_embedding,
    top_k=10,
    filter={"category": "footwear"},
    include_metadata=True
)

for match in results["matches"]:
    print(match["id"], match["score"], match["metadata"])

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

# One call: embedding, search, filter, rerank - all handled
results = client.retrievers.execute(
    retriever_id="product-search",
    inputs={
        "query": "red running shoes",
        "category": "footwear"
    },
    limit=10,
    namespace="product-catalog"
)

for doc in results:
    print(doc["document_id"], doc["score"], doc["metadata"])

What You Gain

Capability	Pinecone	Mixpeek
Multi-stage retrieval	Single KNN query; post-processing is your problem	Chain search, filter, rerank, and enrich stages in one pipeline
Automatic feature extraction	You build and maintain embedding pipelines	Feature extractors handle it: CLIP, Whisper, LayoutLM, and more
Tiered storage	All vectors at one price tier	Hot, cold, and archive tiers, up to 90% savings on infrequently accessed data
Multimodal search	One embedding model per index	Search across text, images, video, and audio in the same namespace
No per-query vector fees	Per-read pricing on every query	Flat API pricing, no per-vector-read charges
Complete lineage	Vectors disconnected from source files	Trace any result back through document, object, and source file

Next Steps

Quickstart

Get Mixpeek running in 10 minutes

Feature Extractors

Learn about automatic feature extraction

Retrievers

Build multi-stage retrieval pipelines

Core Concepts

Understand the data model

Get started

Connect your data

Extract features

Build retrievers

Enrich & organize

Integrate & operate

Resources

Migrate from Pinecone

Why Migrate

Concept Mapping

Migration Steps

Side-by-Side Comparison

What You Gain

Next Steps

Quickstart

Feature Extractors

Retrievers

Core Concepts

​Why Migrate

​Concept Mapping

​Migration Steps

​Side-by-Side Comparison

​What You Gain

​Next Steps

Quickstart

Feature Extractors

Retrievers

Core Concepts

Why Migrate

Concept Mapping

Migration Steps

Side-by-Side Comparison

What You Gain

Next Steps