pplx-embed-v1-0.6b

by perplexity-ai

Diffusion-pretrained 0.6B text embeddings with INT8 quantization — SOTA at sub-1B scale

120Kdl/month

0.6Bparams

HuggingFace Use in Pipeline

Identifiers

Model ID

perplexity-ai/pplx-embed-v1-0.6b

Feature URI

mixpeek://text_extractor@v1/perplexity_pplx_embed_v1_06b

Overview

pplx-embed-v1-0.6B is Perplexity AI's lightweight text embedding model built on diffusion continued pre-trained Qwen3 with bidirectional attention. It natively produces INT8-quantized embeddings, reducing storage requirements by 4x compared to FP32 while maintaining retrieval quality. At just 0.6B parameters, it achieves 68.6 nDCG@10 on MTEB Retrieval — beating the much larger Qwen3-Embed-0.6B (61.2) and BGE-M3 (62.3).

The model supports 32K context length and 1024-dimensional embeddings, with optional binary quantization for 32x storage reduction. On Mixpeek, pplx-embed provides a fast, storage-efficient embedding backbone for text-heavy retrieval pipelines where index size and inference cost are primary constraints.

Architecture

Bidirectional attention transformer built on diffusion continued pre-trained Qwen3. 0.6B parameters. 32K context length. Natively produces INT8-quantized 1024-dimensional embeddings. Supports binary quantization for 32x storage reduction.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "text_embedding",
    version: "v1",
    parameters: { model_id: "perplexity-ai/pplx-embed-v1-0.6b" },
  },
});

Capabilities

68.6 nDCG@10 on MTEB Retrieval — SOTA at sub-1B scale
Native INT8 quantization (4x storage reduction)
Optional binary embeddings (32x storage reduction)
32K context window for long documents
Beats BGE-M3 and Qwen3-Embed-0.6B on retrieval benchmarks

Use Cases on Mixpeek

Storage-efficient text indexing: embed large document collections with minimal disk footprint

High-throughput RAG pipelines: fast embedding generation for retrieval-augmented generation

Cost-sensitive text search: strong retrieval quality at minimal compute and storage cost

Benchmarks

Dataset	Metric	Score	Source
MTEB Retrieval (en)	nDCG@10	68.6	Perplexity AI, 2026 — arxiv:2602.11151
BERGEN End-to-End RAG	Avg score	Beats Qwen3-embedding-4B on 3/5 tasks	Perplexity AI, 2026 — arxiv:2602.11151