Mixpeek Vector Store

Agent-native vector store on object storage

The vector layer behind searchable media archives. Bring your own embeddings and run dense + sparse + BM25 hybrid search on your own S3-compatible storage — stream shard results, cancel stale searches, and cap per-agent spend. Migrate from Pinecone, Qdrant, or Weaviate and cut cost up to 10×.

Bring your vectors, free Read the docs Talk to us

Already on Pinecone, Qdrant, or Weaviate? Migrate in one script — no re-embedding

BYO Storage

Competitive single-thread latency, scale-to-zero idle cost, and capabilities no other store has.

Cost proof

And it's 10-80x cheaper at scale.

Cost is the closer, not the opener. The model works because idle namespaces scale to zero and vectors persist on your object storage.

Cost calculatorEstimate your monthly MVS bill based on vector count, dimensions, and usage. All pricing is pay-as-you-go with no upfront commitments.

Vector dimensionsThe size of each embedding vector. Higher dimensions capture more nuance but use more storage. Common models: 384 (MiniLM), 768 (BERT), 1536 (OpenAI ada-002).

Number of vectorsTotal documents stored across all namespaces. Each document is a vector embedding plus its metadata payload.

1M1 shard

1M10M100M1B10B

StorageObject storage cost for persisting vector data and metadata. Priced at ~$0.023/GB/mo -- the same rate as S3 Standard.$0.01

Hot CachePQ-compressed indexes and payload indexes kept in RAM for sub-10ms queries. Payloads are disk-backed (RocksDB) so hot cache is compact -- only ~60MB per million vectors.$2.50

WritesCost of upsert operations. Includes WAL logging, index updates, and replication to object storage.$0.00

QueriesSearch queries against your namespaces. Idle namespaces scale to zero -- you only pay when querying.$2/1M queries

1 shardsShards partition your data across multiple Rust workers for parallel query execution. MVS auto-scales shards as your dataset grows.1,000 namespacesNamespaces are isolated collections within your account. Use them for multi-tenancy, A/B testing, or separating data by environment.

WorkloadBenchmark scenario: dense ANN search over 768-dimensional vectors with top_k=10 on the configured number of shards.768 dimensions, 1M docs, ~500 MB

p50Median latency -- 50% of queries complete faster than this. Benchmark: 50K vectors, 768d, concurrency=1.

6ms50ms

p9595th percentile -- only 5% of queries are slower. A good measure of consistent performance.

12ms80ms

p9999th percentile (tail latency) -- the worst 1% of queries. Critical for SLA guarantees and real-time apps.

22ms120ms

Warm namespaceData is cached in memory/SSD. Queries hit the hot cache and return in single-digit milliseconds. This is the default for actively queried namespaces.

Cold namespaceCentroid index is memory-mapped on local NVMe for instant partition selection. Only the needed PQ codes are fetched from object storage on demand (~50ms). Idle namespaces scale to zero compute cost.

Dense ANN, 1 shards, top_k=10, 50K vectors

vs Competitors: 50K vectors, 768d, concurrency=1

p50 Latency

Milvus

3.1ms

Weaviate

3.3ms

MVS

5.5ms

Qdrant

5.5ms

pgvector

11.4ms

QPS (concurrency=10)

Weaviate

1,785

Milvus

1,728

MVS

459

Qdrant

360

LanceDB

307

Recall@10

Milvus

26.1%

pgvector

21.0%

Weaviate

17.6%

MVS

17.1%

Qdrant

16.6%

MVS stores vectors on your object storage (S3/GCS/B2) with PQ-compressed indexes in RAM. Competitive latency at single-thread, scale-to-zero idle cost, 10-80x cheaper than Pinecone/Weaviate at scale. Full benchmark methodology.

View full benchmark methodology & results

Pure Usage-Based Pricing

No per-vector caps. No namespace limits. Pay only for what you use.

$0.023

per GB / month

Storage

S3 / GCS / B2 pass-through

$25

per GB / month

Hot Cache

PQ indexes in RAM, sub-10ms

per 1M writes

Writes

Upsert, update, delete

per 1M queries

Queries

Idle namespaces scale to zero

How MVS compares at scale

Vectors	MVS	Qdrant	Pinecone	Weaviate
1M	$25	$120	$500	$73
10M	$50	$460	$2,700	$730
100M	$299	$1,255	$25,000	$7,300
1B	$1,999	$12,500	$26,000	$73,000
5B	$7,999	Contact sales	Contact sales	Contact sales
10B	$14,999	Contact sales	Contact sales	Contact sales

Competitor prices from public pricing calculators (768d, ~100 QPS). Qdrant = dedicated cluster; Pinecone = serverless read units; Weaviate = per-dimension pricing. MVS includes scale-to-zero; idle namespaces cost only storage. Benchmark repo.

Already paying for Pinecone, Qdrant, or Weaviate?

Move every vector with one copy-pastable script. Your embeddings transfer as-is — nothing to re-embed, so there's no GPU bill. Build starts at $25/mo for up to 1M vectors.

Migrate your vectors — no re-embedding See the scripts

Support Tiers

Same usage rates on every tier. Tiers gate support level, not features.

Build

$25/mo minimum

Up to 1M vectors, usage-based

All search types (dense, sparse, BM25, hybrid)
Unlimited namespaces
Schema-on-write & adaptive indexes
Community support

Need Mixpeek to extract the embeddings too?

Start with your own vectors in MVS. When you want faces, scenes, transcripts, OCR, or other features generated for the same objects, move that workload to Managed without rebuilding your retrieval layer.

What Managed adds

Feature extractors
Vision, audio, text -- embeddings generated for you on ingest
Automatic indexing
Upload a file, get searchable vectors without writing pipeline code
Pipeline orchestration
Multi-step processing with branching, retries, and monitoring
Webhooks and alerts
Real-time notifications on ingest completion, anomalies, and threshold triggers

How to upgrade

Same namespace
Your existing vectors, metadata, and indexes stay exactly where they are
No migration
Managed builds on top of MVS -- your data is already in the right place
One click in Studio
Toggle your namespace from Standalone to Managed in the dashboard
Incremental adoption
Start with one extractor, add more as you need them -- no all-or-nothing switch

Feature Comparison

S3 Vectors is a cheap index. MVS is a database with agent-native retrieval. Different tools.

Capability	Pinecone	Qdrant	MVS
Database operations on your vectors
Dense vector search (ANN)Approximate nearest neighbor search over high-dimensional embeddings. The baseline capability every vector store needs.
Native dense + sparse + BM25 hybrid searchFuse semantic, sparse, and keyword relevance in one query plan. S3 Vectors is an index; MVS runs hybrid retrieval natively.	Dense + sparse	Dense + sparse	Dense + sparse + BM25
Semantic JOINs across namespacesJoin two namespaces by vector similarity, like a SQL JOIN but on embeddings. No denormalization or data duplication required.
Aggregations (GROUP BY, COUNT, SUM, AVG)Run analytics directly on your vector store. Group documents by metadata fields and compute counts, averages, and sums without ETL.
Cross-shard transactionsAtomic writes across multiple shards using two-phase commit, so multi-namespace changes remain all-or-nothing.
Time-travel queriesQuery data as it existed at a past point in time by replaying the write-ahead log. Useful for debugging, auditing, and reproducibility.
Object storage-native persistenceData lives in your object storage. S3 Vectors gives you a cheap index; MVS adds database operations on the same storage foundation.
Built for agentic workloads
Streaming partial resultsGet results as shards respond instead of waiting for every shard. Agents can evaluate early hits and decide whether to refine or cancel.
Query cancellationCancel in-flight fan-out queries that an agent no longer needs. Freed shard work returns to the pool instead of burning through the loop.
Per-agent budget limitsEnforce max queries, writes, and compute per agent or API key at the coordinator level. Prevent runaway autonomous loops from running up spend.
Standing queriesRegister a persistent query that fires a webhook whenever a newly ingested document matches.
Multi-stage retrieval pipelinesChain retrieval stages such as broad recall, filtering, joining, and reranking in one request instead of wiring round trips in application code.
Query audit logFull audit trail of every query: who ran it, when, and what was returned. Agents need inspectable retrieval, not black-box calls.

Already on Pinecone or Qdrant? Migrate in one script

Quick Start

Start searching in 60 seconds

Generate embeddings with any provider. Upsert to MVS. Search.

from openai import OpenAI
from mixpeek import Mixpeek

openai = OpenAI()
mvs = Mixpeek(api_key="YOUR_KEY")

# Generate embedding
resp = openai.embeddings.create(
    model="text-embedding-3-small",
    input="red sports car on a mountain road"
)
embedding = resp.data[0].embedding

# Upsert to MVS
mvs.namespaces.documents.upsert(
    namespace="ns_demo",
    documents=[{
        "id": "doc_1",
        "dense_embedding": embedding,
        "metadata": {"source": "catalog", "type": "image"}
    }]
)

# Search
results = mvs.namespaces.documents.search(
    namespace="ns_demo",
    queries=[{"vector": embedding, "top_k": 10}]
)

Works with any provider that outputs a float array. Bring your own model, your own dimensions.

API Examples

Capabilities you will not find in any other vector database.

Write documents with dense, sparse, and metadata in a single call.

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_KEY")

client.namespaces.upsert(
    namespace="products",
    documents=[
        {
            "id": "doc-001",
            "dense_embedding": [0.12, -0.34, ...],  # 768-d
            "sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},
            "metadata": {"category": "electronics", "price": 299.99},
            "text": "Noise-cancelling wireless headphones"
        }
    ]
)

Write documents with dense, sparse, and metadata in a single call.

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_KEY")

client.namespaces.upsert(
    namespace="products",
    documents=[
        {
            "id": "doc-001",
            "dense_embedding": [0.12, -0.34, ...],  # 768-d
            "sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},
            "metadata": {"category": "electronics", "price": 299.99},
            "text": "Noise-cancelling wireless headphones"
        }
    ]
)

How it works

Watch a request flow through the architecture.

Client

SDK / REST API

Ray Coordinator

1 per namespace · consistent-hash routing

gRPC fan-outQuery planning

Rust Shard 0~8ms

LIRE ANNTantivy BM25SparsePayload

Rust Shard 1~8ms

LIRE ANNTantivy BM25SparsePayload

Cold Shard~50ms

Snapshot on S3

Your Object Storage

Any S3-compatible object store

Canonical storeWAL durabilityYou own the data

Multi-vector search explained: single-vector search collapses a document into one embedding and loses nuance, while multi-vector search keeps N vectors per document (visual frames via CLIP/SigLIP, transcript segments via Whisper and E5, face embeddings via ArcFace, OCR text). At query time, late interaction (MaxSim) scores every query vector against every document vector, takes the max per query vector, and sums them into one relevance score, enabling cross-modal matching with no information loss. — The data model underneath: each document keeps one embedding per modality (frames, transcript, faces, on-screen text), and late interaction (MaxSim) folds them into a single relevance score at query time. Nothing is compressed away.

Free Proof of Concept

See it working on your data

Book a 60-minute architecture review. We'll run MVS on your actual workload and benchmark it against your current vector database.

Ethan Steininger

Founder & CEO, Mixpeek

In 60 minutes, you will get:

Live migration of your vectors to MVS on your own object storage

Side-by-side latency and cost comparison vs your current setup

Multi-stage retrieval pipeline built for your use case

Tiered storage plan: what stays hot, what moves to warm

A concrete POC timeline with success criteria

10-80x

Lower cost range
at scale

< 1hr

Typical migration
for 100M vectors

10B+

Vectors supported
per index

Benchmark, stated plainly

Competitive single-thread latency, scale-to-zero idle cost, and database operations the cheap object-storage indexes do not expose.

Review the benchmark methodology

Need Mixpeek to extract the embeddings too?

Use MVS for your own vectors. When you need faces, scenes, transcripts, OCR, or multimodal embeddings generated from the same objects, move that workload to Managed without rebuilding retrieval.

Read the docs

Already stitching extractors, indexes, and retrievers together? Start with MVS, then climb to Managed