Ingest Data

Set Up a Namespace

Every project starts with a namespace — the isolation boundary for all your resources. Use one per environment (dev, staging, prod) or per tenant.

curl -X POST "https://api.mixpeek.com/v1/namespaces" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace_name": "production",
    "feature_extractors": [
      { "feature_extractor_name": "multimodal_extractor", "version": "v1" }
    ]
  }'

Every subsequent request needs two headers: Authorization: Bearer sk_live_... and X-Namespace: ns_.... Namespace API →

Create a Bucket

Buckets are schema-validated containers for raw files. Define what blob types you accept (text, image, audio, video, json, binary).

curl -X POST "https://api.mixpeek.com/v1/buckets" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_name": "product-catalog",
    "bucket_schema": {
      "properties": {
        "product_text": { "type": "text", "required": true },
        "hero_image": { "type": "image" }
      }
    }
  }'

Bucket API →

Connect External Storage

Sync files directly from your existing cloud storage instead of uploading manually. Mixpeek reads from your provider — no migration needed.

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "s3",
    "connection": {
      "bucket_name": "my-source-bucket",
      "region": "us-east-1",
      "credentials": {
        "access_key_id": "AKIA...",
        "secret_access_key": "..."
      }
    },
    "sync_config": {
      "source_path": "/videos/",
      "sync_mode": "incremental",
      "polling_interval": 3600
    }
  }'

Then trigger the first sync:

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/trigger" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

After the initial sync, new files are picked up automatically at the configured polling interval. Use incremental mode to avoid reprocessing existing files.

Provider	Auth Method	S3-Compatible
AWS S3	IAM User / Role	Native
Google Cloud Storage	Service Account Key	No
Azure Blob Storage	Access Key / Managed Identity	No
Cloudflare R2	R2 API Token	Yes
Backblaze B2	Application Key	Yes
Wasabi	Access Key	Yes
Tigris	Access Key	Yes
Box	OAuth	No
Mux	API Token	No
Supabase	Service Key	Yes

See Object Storage providers for provider-specific setup guides. Sync API →

Register Objects

Objects are raw multimodal assets within a bucket. Two paths: URL references — point to files in your existing storage:

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/objects" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "key_prefix": "/products",
    "blobs": [
      { "property": "hero_image", "type": "image", "url": "https://example.com/photo.jpg" },
      { "property": "product_text", "type": "text", "content": "Wireless headphones" }
    ]
  }'

Direct uploads — upload to Mixpeek-managed storage via presigned URLs:

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/uploads" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{ "filename": "photo.jpg", "content_type": "image/jpeg" }'

Then PUT the file to the returned presigned_url and confirm with POST /uploads/{id}/confirm. For bulk imports, use batch uploads or connect your object storage via sync configurations. Object API → · Upload API →

Process with Batches

Batches group objects for extraction. Create a batch, then submit it:

# Create batch
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/batches" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{ "object_ids": ["obj_abc", "obj_def"] }'

# Submit for processing
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/batches/$BATCH_ID/submit" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

Batch Lifecycle

DRAFT → QUEUED → PROCESSING → COMPLETED
                     ↘
                    FAILED

Poll GET /v1/buckets/{id}/batches/{id} for status, or use webhooks to get notified on batch.completed. Batch API →

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Set Up a Namespace

Create a Bucket

Connect External Storage

Register Objects

Process with Batches

Batch Lifecycle

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Documentation Index

​Set Up a Namespace

​Create a Bucket

​Connect External Storage

​Register Objects

​Process with Batches

​Batch Lifecycle

Set Up a Namespace

Create a Bucket

Connect External Storage

Register Objects

Process with Batches

Batch Lifecycle