Mixpeek Logo
    Login / Signup
    Now in Private Beta

    The vector database built on object storage

    Bring your own object storage. Dense + sparse + BM25 hybrid search, aggregations, and automatic tiering — no separate database to manage.

    ~8ms hot search, 50K+ writes/s, and 10K+ queries/s in prod

    Cost calculatorEstimate your monthly MVS bill based on vector count, dimensions, and usage. All pricing is pay-as-you-go with no upfront commitments.

    Vector dimensionsThe size of each embedding vector. Higher dimensions capture more nuance but use more storage. Common models: 384 (MiniLM), 768 (BERT), 1536 (OpenAI ada-002).
    Number of vectorsTotal documents stored across all namespaces. Each document is a vector embedding plus its metadata payload.
    1M1 shard
    1M10M100M1B10B
    StorageObject storage cost for persisting vector data and metadata. Priced at ~$0.023/GB/mo -- the same rate as S3 Standard.$0.01
    Hot CachePQ-compressed indexes and payload indexes kept in RAM for sub-10ms queries. Payloads are disk-backed (RocksDB) so hot cache is compact -- only ~60MB per million vectors.$2.50
    WritesCost of upsert operations. Includes WAL logging, index updates, and replication to object storage.$0.00
    QueriesSearch queries against your namespaces. Idle namespaces scale to zero -- you only pay when querying.$1/1M queries
    1 shardsShards partition your data across multiple Rust workers for parallel query execution. MVS auto-scales shards as your dataset grows.1,000 namespacesNamespaces are isolated collections within your account. Use them for multi-tenancy, A/B testing, or separating data by environment.
    WorkloadBenchmark scenario: dense ANN search over 768-dimensional vectors with top_k=10 on the configured number of shards.768 dimensions, 1M docs, ~500 MB
    p50Median latency -- 50% of queries complete faster than this. Represents the typical user experience.
    7ms50ms
    p9090th percentile -- only 10% of queries are slower. A good measure of consistent performance.
    13ms80ms
    p9999th percentile (tail latency) -- the worst 1% of queries. Critical for SLA guarantees and real-time apps.
    28ms120ms
    Warm namespaceData is cached in memory/SSD. Queries hit the hot cache and return in single-digit milliseconds. This is the default for actively queried namespaces.
    Cold namespaceCentroid index is memory-mapped on local NVMe for instant partition selection. Only the needed PQ codes are fetched from object storage on demand (~50ms). Idle namespaces scale to zero compute cost.
    Approach (1 shards with top_k=10)

    vs Competitors — 50K vectors, 768d, concurrency=1

    p50 Latency

    Milvus
    3.1ms
    Weaviate
    3.3ms
    MVS
    5.5ms
    Qdrant
    5.5ms
    pgvector
    11.4ms

    Queries / sec

    Milvus
    313
    Weaviate
    291
    Chroma
    176
    Qdrant
    158
    MVS
    157

    Recall@10

    Milvus
    26.1%
    pgvector
    21.0%
    Weaviate
    17.6%
    MVS
    17.1%
    Qdrant
    16.6%

    MVS stores vectors on your object storage (S3/GCS/B2) with PQ-compressed indexes in RAM. Competitive latency, scale-to-zero idle cost, 10–80x cheaper than Pinecone/Weaviate at scale.

    View full benchmark methodology & results

    Usage-Based Pricing

    Pay only for what you use. Idle namespaces scale to zero. No node provisioning.

    $0.023

    per GB / month

    Storage

    S3 / GCS / B2 pass-through

    $25

    per GB / month

    Hot Cache

    PQ indexes in RAM, sub-10ms

    $1

    per 1M writes

    Writes

    Upsert, update, delete

    $1

    per 1M queries

    Queries

    Idle namespaces scale to zero

    How MVS compares at scale

    VectorsMVSturbopufferQdrantPineconeWeaviate
    1MFree$10$120$500$73
    10M$49$65$460$2,700$730
    100M$299$358$1,255$25,000$7,300
    1B$1,999$2,800$12,500$26,000$73,000
    5B$7,999Contact salesContact salesContact salesContact sales
    10B$14,999Contact salesContact salesContact salesContact sales

    Competitor prices from public pricing calculators (768d, ~100 QPS). turbopuffer = serverless S3-native with tiered discounts; Qdrant = dedicated cluster; Pinecone = serverless read units; Weaviate = per-dimension pricing. MVS includes scale-to-zero — idle namespaces cost only storage. Benchmark repo.

    Feature Comparison

    MVS vs the leading vector databases. Rows highlighted in purple are MVS-exclusive capabilities.

    CapabilityPineconeQdrantTurbopufferMVS
    Search
    Dense vector search (ANN)Approximate nearest neighbor search over high-dimensional embeddings. The foundation of semantic search -- find results by meaning, not keywords.
    Sparse vector searchSearch using sparse vectors like SPLADE or learned sparse embeddings. Captures keyword-level signals that dense vectors miss.
    BM25 full-text searchClassic keyword search built on an inverted index. MVS uses Tantivy natively -- no workarounds or external engines needed.SPLADE workaroundNative Tantivy
    Multi-dense (ColBERT)Late-interaction retrieval that stores per-token embeddings for higher recall. Enables token-level matching without collapsing to a single vector.
    Hybrid search (RRF/DBSF fusion)Combine dense, sparse, and keyword results into a single ranked list using Reciprocal Rank Fusion or Distribution-Based Score Fusion.
    Multi-stage retrieval pipelinesChain retrieval stages -- e.g. broad recall with ANN, then re-rank with a cross-encoder -- in a single query. Reduces latency vs round-trips.
    Standing queries (push on match)Register a persistent query that fires a webhook whenever a newly ingested document matches. Useful for alerting, monitoring, and real-time feeds.
    Semantic JOINs across namespacesJoin two namespaces by vector similarity -- like a SQL JOIN but on embeddings. No denormalization or data duplication required.
    Data Operations
    Aggregation (GROUP BY, COUNT, SUM, AVG)Run analytics directly on your vector store. Group documents by metadata fields and compute counts, averages, sums -- no ETL to a data warehouse.
    Cross-shard transactions (2PC)Atomic writes across multiple shards using two-phase commit. Ensures all-or-nothing consistency even at billion-scale datasets.
    Optimistic concurrency (_version)Prevent write conflicts with version-based optimistic locking. Critical for multi-writer workloads where two processes might update the same document.
    Change streams (WAL-tailing, SSE)Subscribe to real-time insert/update/delete events via Server-Sent Events. Build reactive pipelines without polling your database.
    Time-travel queries (WAL replay)Query your data as it existed at a past point in time by replaying the write-ahead log. Useful for debugging, auditing, and reproducibility.
    Document version historyEvery mutation is versioned. Roll back a document to any prior state or diff two versions to see exactly what changed.
    Query audit logFull audit trail of every query executed -- who ran it, when, and what was returned. Essential for compliance and debugging in production.
    Reliability & Governance
    Storage tiering (hot/cold/archive)Automatically move infrequently accessed data from memory/SSD to object storage. Cut costs without manual data management.Automatic, object storage-backed
    Retention policiesSet TTLs on documents or namespaces. Data is automatically purged after the retention window -- no cron jobs or manual cleanup.
    Namespace catalog (INFORMATION_SCHEMA)Discover all namespaces, their schemas, row counts, and storage usage via a system catalog. Like INFORMATION_SCHEMA in SQL databases.
    Multi-tenant isolation (noisy neighbor)Resource isolation between tenants prevents one workload from starving others. Each namespace has independent rate limits and resource quotas.
    Priority lanes (QoS scheduling)Assign CRITICAL/NORMAL/BACKGROUND/BULK priority to requests. Higher-priority queries get reserved compute slots and preempt lower-priority work in the shard queue.
    Idempotent operationsEvery write accepts an idempotency key. Retries from crashes or network timeouts are automatically deduplicated -- no duplicate documents, no double-counted aggregations.
    Distributed execution tracesFull distributed trace for every query -- coordinator routing, per-shard timing, filter selectivity, index hits. Debug multi-hop requests across the entire fan-out path.
    Agentic Workloads
    Streaming partial results (SSE)Get results as shards respond instead of waiting for all shards. Agents evaluate early hits and decide whether to refine or cancel -- the tight feedback loop pattern that defines agentic retrieval.
    Query cancellation (cooperative termination)Cancel in-flight fan-out queries that are no longer needed. When an agent fires 5 parallel searches and gets an answer from the first, the other 4 are terminated at the shard level, freeing compute instantly.
    Per-agent budget limitsEnforce max queries, writes, and compute per agent or API key at the coordinator level. Prevents runaway autonomous loops -- the specific failure mode where an LLM in a loop issues unbounded queries.
    Infrastructure
    Object storage-native (no separate DB to manage)Data lives in your object storage (S3, GCS, Azure Blob). No separate database cluster to provision, back up, or scale -- just point MVS at your bucket.
    Self-hosted optionDeploy MVS in your own VPC or on-prem. Full control over data residency, network policies, and infrastructure -- no vendor lock-in.OSS

    API Examples

    Capabilities you will not find in any other vector database.

    Write documents with dense, sparse, and metadata in a single call.

    from mixpeek import Mixpeek
    client = Mixpeek(api_key="YOUR_KEY")
    client.namespaces.upsert(
    namespace="products",
    documents=[
    {
    "id": "doc-001",
    "dense_embedding": [0.12, -0.34, ...], # 768-d
    "sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},
    "metadata": {"category": "electronics", "price": 299.99},
    "text": "Noise-cancelling wireless headphones"
    }
    ]
    )

    How it works

    Watch a request flow through the architecture.

    Client

    SDK / REST API

    Ray Coordinator

    1 per namespace · consistent-hash routing

    gRPC fan-outQuery planning
    Rust Shard 0~8ms
    LIRE ANNTantivy BM25SparsePayload
    Rust Shard 1~8ms
    LIRE ANNTantivy BM25SparsePayload
    Cold Shard~200ms

    Snapshot on S3

    Your Object Storage

    Any S3-compatible object store

    Canonical storeWAL durabilityYou own the data

    Free Proof of Concept

    See it working on your data

    Book a 60-minute architecture review. We'll run MVS on your actual workload and benchmark it against your current vector database.

    ES

    Ethan Steininger

    Founder & CEO, Mixpeek

    In 60 minutes, you will get:

    Live migration of your vectors to MVS on your own object storage
    Side-by-side latency and cost comparison vs your current setup
    Multi-stage retrieval pipeline built for your use case
    Tiered storage plan — what stays hot, what moves to warm
    A concrete POC timeline with success criteria

    90%+

    Avg. cost reduction
    vs Pinecone & Weaviate

    < 1hr

    Typical migration
    for 100M vectors

    10B+

    Vectors supported
    per index

    "We run 100M+ vectors on MVS backed by our own GCS bucket. Hot queries come back in single-digit ms, cold namespaces warm up in seconds, and our infra bill is a fraction of what Qdrant Cloud quoted us."

    ES

    Early access customer

    AI infrastructure team

    Ready to scale to billions of vectors?

    Start with 1M vectors free. No credit card required. Deploy in minutes.