This is the definitive guide to multi-stage retrieval. For ready-to-copy pipeline configs, see the Retrieval Cookbook. For the full stage catalog and parameter schemas, see Retrievers.
Why Single-Query Search Isn’t Enough
Traditional search systems give you one query, one index, one ranked list. This creates three problems that compound as your data grows: 1. Signal collapse. You want to find content that matches a face and contains a specific logo and has negative sentiment. A single vector query can only encode one of these signals. You end up running three separate queries and stitching results together in application code. 2. N+1 enrichment. After retrieving results, you need to join them with metadata from another collection, call an external API for licensing info, or classify each result against a taxonomy. Without pipeline-level enrichment, every result triggers a separate round-trip from your application. 3. Brittle application logic. Filtering, ranking, deduplication, and reshaping all live in your application layer. Every new use case means new glue code. Every change to ranking logic means a redeploy. Multi-stage retrieval moves all of this into the retriever definition itself --- a declarative pipeline that the engine executes in a single pass.The SQL Analogy
If you know SQL, you already understand multi-stage retrieval. Each stage type maps to a SQL clause:| Stage Type | SQL Equivalent | What It Does |
|---|---|---|
| filter | WHERE | Narrow the document set based on conditions --- semantic similarity, metadata predicates, feature thresholds |
| sort | ORDER BY | Reorder documents by score, attribute, or cross-encoder reranking |
| reduce | LIMIT / GROUP BY | Collapse results --- top-k sampling, deduplication, aggregation, summarization |
| enrich | JOIN | Add data from other collections, LLM-generated fields, or taxonomy classifications |
| apply | SELECT / TRANSFORM | Reshape output, call external APIs, execute custom code, run web searches |
The Five Stage Types
Filter --- Narrow the Candidate Set
Filter stages reduce the number of documents flowing through the pipeline. They are theWHERE clause of your retrieval query. Every pipeline starts with at least one filter.
Use filter stages to:
- Run semantic similarity search against any extracted feature
- Apply metadata predicates (equality, range, set membership)
- Chain multiple filters for compound conditions (face match AND logo match AND date range)
Sort --- Control Ranking
Sort stages reorder the document set without adding or removing documents. They are theORDER BY clause. Place them after filters to control which results appear first.
Use sort stages to:
- Apply weighted linear scoring across multiple signals
- Rerank results with a cross-encoder model for higher precision
- Sort by a metadata attribute (date, price, popularity)
Reduce --- Collapse and Limit
Reduce stages collapse the result set. They are theLIMIT, GROUP BY, and DISTINCT clauses. Use them to control result count, remove duplicates, or aggregate values.
Use reduce stages to:
- Sample the top-k results after sorting
- Deduplicate by a field (e.g., one result per source URL)
- Summarize results into an aggregated output
Enrich --- Join External Knowledge
Enrich stages add data to each document without changing the result set size. They are theJOIN clause. Use them to attach metadata from other collections, generate LLM-powered annotations, or classify documents against taxonomies.
Use enrich stages to:
- Cross-collection joins (product data + catalog info + pricing)
- LLM enrichment (generate summaries, extract entities, assess risk)
- Taxonomy classification (label documents against a controlled vocabulary)
Enrich stages execute per-document but are batched internally. A
document_enrich join resolves all lookups in a single batch query to the target collection, not one query per document.Apply --- Transform and Reshape
Apply stages transform the structure or content of each document. They are theSELECT and function-call layer of your pipeline. Use them to reshape output for downstream consumers, call external APIs, execute custom code, or search the web.
Use apply stages to:
- Reshape JSON output with Jinja2 templates
- Call external APIs (Stripe, Salesforce, internal services)
- Execute custom Python/TypeScript/JavaScript in sandboxed environments
- Run web searches to augment results with external context
Building Multi-Stage Pipelines
The power of multi-stage retrieval is in composition. Here are three production pipelines that demonstrate how stages chain together to solve complex problems that no single query can address.Pipeline 1: Brand Safety Scanner
Problem: A media company needs to find scenes where their talent appears near competitor products in negative-sentiment content --- before the content goes live. Pipeline logic: Find faces matching talent roster, then check for competitor logos in the same scenes, rank by sentiment risk, take the worst offenders, and attach brand safety context.Pipeline 2: IP Clearance Pipeline
Problem: Before publishing new content, a legal team needs to check it against a database of copyrighted material across audio fingerprints, visual similarity, and metadata --- then attach licensing information for review. Pipeline logic: Match audio fingerprints, check visual similarity for the same assets, filter by rights status, sort by match confidence, and attach the full licensing record.Pipeline 3: Content Moderation
Problem: A platform needs to scan user-uploaded content across multiple safety dimensions (NSFW, violence, toxicity), aggregate risk scores, and route flagged content to a moderation queue. Pipeline logic: Filter for NSFW content above threshold, check text toxicity, sort by combined risk, take the worst offenders, classify against a moderation taxonomy, and push to the review queue.Performance Characteristics
Multi-stage pipelines avoid the N+1 problem that plagues application-level orchestration. Here is how: 1. Filter stages execute server-side against indexes. Afeature_search filter runs directly against the MVS vector index. No data leaves the engine until the candidate set is narrowed. Chaining two filter stages does not mean two round-trips from your application --- both execute within the engine in sequence.
2. Enrich stages batch internally. A document_enrich join across 50 results resolves in a single batch query to the target collection, not 50 separate lookups. LLM enrichment stages batch prompts where possible.
3. Reduce stages shrink the working set early. Place a sampling or dedup stage as early as possible to minimize the number of documents flowing through expensive downstream stages (LLM enrichment, API calls).
4. The pipeline streams, not materializes. Documents flow through stages incrementally. A 6-stage pipeline does not create 6 intermediate copies of the full result set. Each stage processes and passes documents forward.
When to Use Which Stage Type
Use this decision guide when designing your pipeline:I need to narrow down results based on content or metadata
I need to narrow down results based on content or metadata
Use a filter stage. Start with
feature_search for semantic/vector-based filtering, or metadata for structured attribute filtering. Chain multiple filters for compound conditions.I need to reorder results by relevance or business logic
I need to reorder results by relevance or business logic
Use a sort stage. Choose
score_linear for weighted multi-signal ranking, cross_encoder_rerank for high-precision reranking with a cross-encoder model, or attribute_sort for simple field-based ordering.I need fewer results, or deduplicated results
I need fewer results, or deduplicated results
Use a reduce stage. Choose
sampling for top-k limits, dedup for deduplication by field, or summarize for LLM-powered aggregation of results into a single summary.I need to add data from another collection, an LLM, or a taxonomy
I need to add data from another collection, an LLM, or a taxonomy
Use an enrich stage. Choose
document_enrich for cross-collection joins, llm_enrich for AI-generated fields, or taxonomy_enrich for classification against a controlled vocabulary.I need to reshape output, call an API, or run custom logic
I need to reshape output, call an API, or run custom logic
Use an apply stage. Choose
json_transform for output reshaping, api_call for external service integration, code_execution for custom Python/TypeScript/JavaScript, or external_web_search for web augmentation.Stage Ordering Rules of Thumb
- Filter first. Every pipeline should start with one or more filter stages to narrow the candidate set.
- Sort second. Apply ranking after filtering so you are sorting a smaller set.
- Reduce third. Cut the result set to a manageable size before enrichment.
- Enrich fourth. Add external data only to the documents that survived filtering, sorting, and reduction.
- Apply last. Reshape output and trigger side effects at the end of the pipeline.
These are guidelines, not hard rules. Some pipelines benefit from enriching before sorting (e.g., sort by a field that only exists after enrichment). Design your pipeline around your data flow, not a rigid template.
Related Resources
Retrievers
Full stage catalog, parameter schemas, and retriever configuration reference
Retrieval Cookbook
Ready-to-copy pipeline configurations for common use cases
Stage Reference
Detailed documentation for every stage type and stage ID
Caching
Configure retriever-level caching for repeated queries

