5.5ms p50 vector search, 12ms p50 BM25, 459 QPS at concurrency=10
Cost calculatorEstimate your monthly MVS bill based on vector count, dimensions, and usage. All pricing is pay-as-you-go with no upfront commitments.
vs Competitors — 50K vectors, 768d, concurrency=1
p50 Latency
QPS (concurrency=10)
Recall@10
MVS stores vectors on your object storage (S3/GCS/B2) with PQ-compressed indexes in RAM. Competitive latency at single-thread, scale-to-zero idle cost, 10-80x cheaper than Pinecone/Weaviate at scale. Full benchmark methodology.
Pure Usage-Based Pricing
No per-vector caps. No namespace limits. Pay only for what you use.
$0.023
per GB / month
Storage
S3 / GCS / B2 pass-through
$25
per GB / month
Hot Cache
PQ indexes in RAM, sub-10ms
$1
per 1M writes
Writes
Upsert, update, delete
$1
per 1M queries
Queries
Idle namespaces scale to zero
How MVS compares at scale
| Vectors | MVS | Qdrant | Pinecone | Weaviate |
|---|---|---|---|---|
| 1M | Free | $120 | $500 | $73 |
| 10M | $49 | $460 | $2,700 | $730 |
| 100M | $299 | $1,255 | $25,000 | $7,300 |
| 1B | $1,999 | $12,500 | $26,000 | $73,000 |
| 5B | $7,999 | Contact sales | Contact sales | Contact sales |
| 10B | $14,999 | Contact sales | Contact sales | Contact sales |
Competitor prices from public pricing calculators (768d, ~100 QPS). Qdrant = dedicated cluster; Pinecone = serverless read units; Weaviate = per-dimension pricing. MVS includes scale-to-zero — idle namespaces cost only storage. Benchmark repo.
Support Tiers
Same usage rates on every tier. Tiers gate support level, not features.
Feature Comparison
MVS vs the leading vector databases. Rows highlighted in purple are MVS-exclusive capabilities.
| Capability | Pinecone | Qdrant | MVS |
|---|---|---|---|
| Search | |||
| Dense vector search (ANN)Approximate nearest neighbor search over high-dimensional embeddings. The foundation of semantic search -- find results by meaning, not keywords. | |||
| Sparse vector searchSearch using sparse vectors like SPLADE or learned sparse embeddings. Captures keyword-level signals that dense vectors miss. | |||
| BM25 full-text searchClassic keyword search built on an inverted index. MVS uses Tantivy natively -- no workarounds or external engines needed. | SPLADE workaround | Native Tantivy | |
| Multi-dense (ColBERT)Late-interaction retrieval that stores per-token embeddings for higher recall. Enables token-level matching without collapsing to a single vector. | |||
| Hybrid search (RRF/DBSF fusion)Combine dense, sparse, and keyword results into a single ranked list using Reciprocal Rank Fusion or Distribution-Based Score Fusion. | |||
| Multi-stage retrieval pipelinesChain retrieval stages -- e.g. broad recall with ANN, then re-rank with a cross-encoder -- in a single query. Reduces latency vs round-trips. | |||
| Standing queries (push on match)Register a persistent query that fires a webhook whenever a newly ingested document matches. Useful for alerting, monitoring, and real-time feeds. | |||
| Semantic JOINs across namespacesJoin two namespaces by vector similarity -- like a SQL JOIN but on embeddings. No denormalization or data duplication required. | |||
| Data Operations | |||
| Aggregation (GROUP BY, COUNT, SUM, AVG)Run analytics directly on your vector store. Group documents by metadata fields and compute counts, averages, sums -- no ETL to a data warehouse. | |||
| Cross-shard transactions (2PC)Atomic writes across multiple shards using two-phase commit. Ensures all-or-nothing consistency even at billion-scale datasets. | |||
| Optimistic concurrency (_version)Prevent write conflicts with version-based optimistic locking. Critical for multi-writer workloads where two processes might update the same document. | |||
| Change streams (WAL-tailing, SSE)Subscribe to real-time insert/update/delete events via Server-Sent Events. Build reactive pipelines without polling your database. | |||
| Time-travel queries (WAL replay)Query your data as it existed at a past point in time by replaying the write-ahead log. Useful for debugging, auditing, and reproducibility. | |||
| Document version historyEvery mutation is versioned. Roll back a document to any prior state or diff two versions to see exactly what changed. | |||
| Query audit logFull audit trail of every query executed -- who ran it, when, and what was returned. Essential for compliance and debugging in production. | |||
| Reliability & Governance | |||
| Storage tiering (hot/cold/archive)Automatically move infrequently accessed data from memory/SSD to object storage. Cut costs without manual data management. | Automatic, object storage-backed | ||
| Retention policiesSet TTLs on documents or namespaces. Data is automatically purged after the retention window -- no cron jobs or manual cleanup. | |||
| Namespace catalog (INFORMATION_SCHEMA)Discover all namespaces, their schemas, row counts, and storage usage via a system catalog. Like INFORMATION_SCHEMA in SQL databases. | |||
| Multi-tenant isolation (noisy neighbor)Resource isolation between tenants prevents one workload from starving others. Each namespace has independent rate limits and resource quotas. | |||
| Priority lanes (QoS scheduling)Assign CRITICAL/NORMAL/BACKGROUND/BULK priority to requests. Higher-priority queries get reserved compute slots and preempt lower-priority work in the shard queue. | |||
| Idempotent operationsEvery write accepts an idempotency key. Retries from crashes or network timeouts are automatically deduplicated -- no duplicate documents, no double-counted aggregations. | |||
| Distributed execution tracesFull distributed trace for every query -- coordinator routing, per-shard timing, filter selectivity, index hits. Debug multi-hop requests across the entire fan-out path. | |||
| Agentic Workloads | |||
| Streaming partial results (SSE)Get results as shards respond instead of waiting for all shards. Agents evaluate early hits and decide whether to refine or cancel -- the tight feedback loop pattern that defines agentic retrieval. | |||
| Query cancellation (cooperative termination)Cancel in-flight fan-out queries that are no longer needed. When an agent fires 5 parallel searches and gets an answer from the first, the other 4 are terminated at the shard level, freeing compute instantly. | |||
| Per-agent budget limitsEnforce max queries, writes, and compute per agent or API key at the coordinator level. Prevents runaway autonomous loops -- the specific failure mode where an LLM in a loop issues unbounded queries. | |||
| Infrastructure | |||
| Object storage-native (no separate DB to manage)Data lives in your object storage (S3, GCS, Azure Blob). No separate database cluster to provision, back up, or scale -- just point MVS at your bucket. | |||
| Self-hosted optionDeploy MVS in your own VPC or on-prem. Full control over data residency, network policies, and infrastructure -- no vendor lock-in. | OSS | ||
API Examples
Capabilities you will not find in any other vector database.
Write documents with dense, sparse, and metadata in a single call.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_KEY")client.namespaces.upsert(namespace="products",documents=[{"id": "doc-001","dense_embedding": [0.12, -0.34, ...], # 768-d"sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},"metadata": {"category": "electronics", "price": 299.99},"text": "Noise-cancelling wireless headphones"}])
Write documents with dense, sparse, and metadata in a single call.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_KEY")client.namespaces.upsert(namespace="products",documents=[{"id": "doc-001","dense_embedding": [0.12, -0.34, ...], # 768-d"sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},"metadata": {"category": "electronics", "price": 299.99},"text": "Noise-cancelling wireless headphones"}])
How it works
Watch a request flow through the architecture.
SDK / REST API
1 per namespace · consistent-hash routing
Snapshot on S3
Any S3-compatible object store
Free Proof of Concept
See it working on your data
Book a 60-minute architecture review. We'll run MVS on your actual workload and benchmark it against your current vector database.
Ethan Steininger
Founder & CEO, Mixpeek
In 60 minutes, you will get:
90%+
Avg. cost reduction
vs Pinecone & Weaviate
< 1hr
Typical migration
for 100M vectors
10B+
Vectors supported
per index
"We run 100M+ vectors on MVS backed by our own GCS bucket. Hot queries come back in single-digit ms, cold namespaces warm up in seconds, and our infra bill is a fraction of what Qdrant Cloud quoted us."
Early access customer
AI infrastructure team
Ready to scale to billions of vectors?
Start with 1M vectors free. No credit card required. Deploy in minutes.