Engine processes
Ray pollers pick up the batch, execute extractors tier-by-tier, and write documents to MVS.
Create a Batch
Submit for Processing
include_processing_history=truerecords each enrichment operation ininternal_metadata.processing_history.- Response contains a
task_id; poll/v1/tasks/{task_id}or the batch resource directly.
Lifecycle & Status
| Status | Meaning |
|---|---|
DRAFT | Created but not submitted |
QUEUED | Submitted; waiting for poller pickup |
PROCESSING | Ray job running feature extractors |
COMPLETED | All extractors finished successfully |
FAILED | Extractors or Ray job failed (see error_message) |
Under the Hood
- API writes manifest metadata to MongoDB and extractor row artifacts to S3.
- Ray poller queries MongoDB every 5 seconds for
PENDINGbatches. - Poller submits a Ray job with manifest details.
- Worker downloads artifacts, runs extractors in dependency tiers, and writes documents to MVS/MongoDB.
- QdrantBatchProcessor emits webhook events and updates collection index signatures.
Monitoring
Real-time progress
Poll the batch endpoint to get live progress:progress object with real-time processing metrics:
| Field | Description |
|---|---|
progress.processed | Number of objects processed so far |
progress.total | Total objects in the batch |
progress.percent | Completion percentage |
progress.items_per_second | Current processing throughput |
progress.eta_seconds | Estimated seconds until completion |
progress.current_stage | Current pipeline stage (name, index, total) |
progress.errors | Number of processing errors |
status_message | Human-readable summary (e.g. “Processing 21,611/46,160 objects (46.8%) · 1.4 items/sec · ~4h 57m remaining”) |
health | Overall batch health (healthy, degraded, unhealthy) |
estimated_completion | ISO 8601 timestamp of estimated completion |
Other monitoring options
GET /v1/tasks/<task_id>– track task-level progress (Redis TTL ≈ 24h).GET /v1/buckets/<bucket_id>/batches/<batch_id>/health– batch health check.- Webhook events (
collection.documents.written) notify you when documents land.
Scaling Tips
- Chunk large imports into batches of 1k–10k objects to keep pollers responsive.
- Parallelize submissions—pollers handle multiple batches concurrently.
- Use namespaces to isolate environments; pollers are namespace-aware.
- Retry safely—batch submission and task polling are idempotent.
- Pipeline scheduling—combine Celery Beat or your orchestrator to submit batches on cron.
Related APIs
- Create Batch
- Add Objects to Batch
- Submit Batch
- List Batches
- Get Batch Health
- Delete Batch
- Tasks for status and error handling

