~8ms hot search, 50K+ writes/s, and 10K+ queries/s in prod
Cost calculatorEstimate your monthly MVS bill based on vector count, dimensions, and usage. All pricing is pay-as-you-go with no upfront commitments.
vs Competitors — 50K vectors, 768d, concurrency=1
p50 Latency
Queries / sec
Recall@10
MVS stores vectors on your object storage (S3/GCS/B2) with PQ-compressed indexes in RAM. Competitive latency, scale-to-zero idle cost, 10–80x cheaper than Pinecone/Weaviate at scale.
Usage-Based Pricing
Pay only for what you use. Idle namespaces scale to zero. No node provisioning.
$0.023
per GB / month
Storage
S3 / GCS / B2 pass-through
$25
per GB / month
Hot Cache
PQ indexes in RAM, sub-10ms
$1
per 1M writes
Writes
Upsert, update, delete
$1
per 1M queries
Queries
Idle namespaces scale to zero
How MVS compares at scale
| Vectors | MVS | turbopuffer | Qdrant | Pinecone | Weaviate |
|---|---|---|---|---|---|
| 1M | Free | $10 | $120 | $500 | $73 |
| 10M | $49 | $65 | $460 | $2,700 | $730 |
| 100M | $299 | $358 | $1,255 | $25,000 | $7,300 |
| 1B | $1,999 | $2,800 | $12,500 | $26,000 | $73,000 |
| 5B | $7,999 | Contact sales | Contact sales | Contact sales | Contact sales |
| 10B | $14,999 | Contact sales | Contact sales | Contact sales | Contact sales |
Competitor prices from public pricing calculators (768d, ~100 QPS). turbopuffer = serverless S3-native with tiered discounts; Qdrant = dedicated cluster; Pinecone = serverless read units; Weaviate = per-dimension pricing. MVS includes scale-to-zero — idle namespaces cost only storage. Benchmark repo.
Feature Comparison
MVS vs the leading vector databases. Rows highlighted in purple are MVS-exclusive capabilities.
| Capability | Pinecone | Qdrant | Turbopuffer | MVS |
|---|---|---|---|---|
| Search | ||||
| Dense vector search (ANN)Approximate nearest neighbor search over high-dimensional embeddings. The foundation of semantic search -- find results by meaning, not keywords. | ||||
| Sparse vector searchSearch using sparse vectors like SPLADE or learned sparse embeddings. Captures keyword-level signals that dense vectors miss. | ||||
| BM25 full-text searchClassic keyword search built on an inverted index. MVS uses Tantivy natively -- no workarounds or external engines needed. | SPLADE workaround | Native Tantivy | ||
| Multi-dense (ColBERT)Late-interaction retrieval that stores per-token embeddings for higher recall. Enables token-level matching without collapsing to a single vector. | ||||
| Hybrid search (RRF/DBSF fusion)Combine dense, sparse, and keyword results into a single ranked list using Reciprocal Rank Fusion or Distribution-Based Score Fusion. | ||||
| Multi-stage retrieval pipelinesChain retrieval stages -- e.g. broad recall with ANN, then re-rank with a cross-encoder -- in a single query. Reduces latency vs round-trips. | ||||
| Standing queries (push on match)Register a persistent query that fires a webhook whenever a newly ingested document matches. Useful for alerting, monitoring, and real-time feeds. | ||||
| Semantic JOINs across namespacesJoin two namespaces by vector similarity -- like a SQL JOIN but on embeddings. No denormalization or data duplication required. | ||||
| Data Operations | ||||
| Aggregation (GROUP BY, COUNT, SUM, AVG)Run analytics directly on your vector store. Group documents by metadata fields and compute counts, averages, sums -- no ETL to a data warehouse. | ||||
| Cross-shard transactions (2PC)Atomic writes across multiple shards using two-phase commit. Ensures all-or-nothing consistency even at billion-scale datasets. | ||||
| Optimistic concurrency (_version)Prevent write conflicts with version-based optimistic locking. Critical for multi-writer workloads where two processes might update the same document. | ||||
| Change streams (WAL-tailing, SSE)Subscribe to real-time insert/update/delete events via Server-Sent Events. Build reactive pipelines without polling your database. | ||||
| Time-travel queries (WAL replay)Query your data as it existed at a past point in time by replaying the write-ahead log. Useful for debugging, auditing, and reproducibility. | ||||
| Document version historyEvery mutation is versioned. Roll back a document to any prior state or diff two versions to see exactly what changed. | ||||
| Query audit logFull audit trail of every query executed -- who ran it, when, and what was returned. Essential for compliance and debugging in production. | ||||
| Reliability & Governance | ||||
| Storage tiering (hot/cold/archive)Automatically move infrequently accessed data from memory/SSD to object storage. Cut costs without manual data management. | Automatic, object storage-backed | |||
| Retention policiesSet TTLs on documents or namespaces. Data is automatically purged after the retention window -- no cron jobs or manual cleanup. | ||||
| Namespace catalog (INFORMATION_SCHEMA)Discover all namespaces, their schemas, row counts, and storage usage via a system catalog. Like INFORMATION_SCHEMA in SQL databases. | ||||
| Multi-tenant isolation (noisy neighbor)Resource isolation between tenants prevents one workload from starving others. Each namespace has independent rate limits and resource quotas. | ||||
| Priority lanes (QoS scheduling)Assign CRITICAL/NORMAL/BACKGROUND/BULK priority to requests. Higher-priority queries get reserved compute slots and preempt lower-priority work in the shard queue. | ||||
| Idempotent operationsEvery write accepts an idempotency key. Retries from crashes or network timeouts are automatically deduplicated -- no duplicate documents, no double-counted aggregations. | ||||
| Distributed execution tracesFull distributed trace for every query -- coordinator routing, per-shard timing, filter selectivity, index hits. Debug multi-hop requests across the entire fan-out path. | ||||
| Agentic Workloads | ||||
| Streaming partial results (SSE)Get results as shards respond instead of waiting for all shards. Agents evaluate early hits and decide whether to refine or cancel -- the tight feedback loop pattern that defines agentic retrieval. | ||||
| Query cancellation (cooperative termination)Cancel in-flight fan-out queries that are no longer needed. When an agent fires 5 parallel searches and gets an answer from the first, the other 4 are terminated at the shard level, freeing compute instantly. | ||||
| Per-agent budget limitsEnforce max queries, writes, and compute per agent or API key at the coordinator level. Prevents runaway autonomous loops -- the specific failure mode where an LLM in a loop issues unbounded queries. | ||||
| Infrastructure | ||||
| Object storage-native (no separate DB to manage)Data lives in your object storage (S3, GCS, Azure Blob). No separate database cluster to provision, back up, or scale -- just point MVS at your bucket. | ||||
| Self-hosted optionDeploy MVS in your own VPC or on-prem. Full control over data residency, network policies, and infrastructure -- no vendor lock-in. | OSS | |||
API Examples
Capabilities you will not find in any other vector database.
Write documents with dense, sparse, and metadata in a single call.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_KEY")client.namespaces.upsert(namespace="products",documents=[{"id": "doc-001","dense_embedding": [0.12, -0.34, ...], # 768-d"sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},"metadata": {"category": "electronics", "price": 299.99},"text": "Noise-cancelling wireless headphones"}])
Write documents with dense, sparse, and metadata in a single call.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_KEY")client.namespaces.upsert(namespace="products",documents=[{"id": "doc-001","dense_embedding": [0.12, -0.34, ...], # 768-d"sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},"metadata": {"category": "electronics", "price": 299.99},"text": "Noise-cancelling wireless headphones"}])
How it works
Watch a request flow through the architecture.
SDK / REST API
1 per namespace · consistent-hash routing
Snapshot on S3
Any S3-compatible object store
Free Proof of Concept
See it working on your data
Book a 60-minute architecture review. We'll run MVS on your actual workload and benchmark it against your current vector database.
Ethan Steininger
Founder & CEO, Mixpeek
In 60 minutes, you will get:
90%+
Avg. cost reduction
vs Pinecone & Weaviate
< 1hr
Typical migration
for 100M vectors
10B+
Vectors supported
per index
"We run 100M+ vectors on MVS backed by our own GCS bucket. Hot queries come back in single-digit ms, cold namespaces warm up in seconds, and our infra bill is a fraction of what Qdrant Cloud quoted us."
Early access customer
AI infrastructure team
