/v1/jobs

For inference calls you don’t want to block on (long-context document processing, batch prompts, mobile clients that can’t hold an SSE stream), submit the same request body as you would to /v1/messages or /v1/chat/completions, but wrap it in /v1/jobs. qlaud returns a job id immediately; the actual upstream call runs on a Cloudflare Queue consumer and the response is retrievable via GET /v1/jobs/:id.

POST /v1/jobs — Submit

curl https://api.qlaud.ai/v1/jobs \
  -H "x-api-key: $QLAUD_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "endpoint": "/v1/messages",
    "body": {
      "model": "claude-sonnet-4-6",
      "max_tokens": 1024,
      "messages": [{"role":"user","content":"Summarize this 50-page doc..."}]
    }
  }'

Body

Field	Type	Required	Description
`endpoint`	`"/v1/messages"` \| `"/v1/chat/completions"`	yes	Which underlying endpoint to invoke.
`body`	object	yes	The same JSON body you’d POST to that endpoint synchronously. `stream: true` is stripped — async is non-streaming-only for v1.

Response (202)

{
  "id": "67977425-94bb-45ac-8eaa-8379fb798296",
  "object": "job",
  "status": "queued",
  "endpoint": "/v1/messages",
  "created_at": 1777262218214
}

GET /v1/jobs/:id — Poll

curl https://api.qlaud.ai/v1/jobs/$JOB_ID \
  -H "x-api-key: $QLAUD_API_KEY"

Response shape varies by status

Queued or running:

{
  "id": "67977425-...",
  "object": "job",
  "status": "queued",
  "endpoint": "/v1/messages",
  "model": null,
  "created_at": 1777262218214,
  "started_at": null,
  "completed_at": null
}

Succeeded:

{
  "id": "67977425-...",
  "object": "job",
  "status": "succeeded",
  "endpoint": "/v1/messages",
  "model": "claude-sonnet-4-6",
  "created_at": 1777262218214,
  "started_at": 1777262225083,
  "completed_at": 1777262226887,
  "response": {
    "id": "msg_xxx",
    "type": "message",
    "role": "assistant",
    "content": [{"type": "text", "text": "Here's a summary…"}],
    "stop_reason": "end_turn",
    "usage": { "input_tokens": 12500, "output_tokens": 240 }
  },
  "usage": { "input_tokens": 12500, "output_tokens": 240 },
  "cost_micros": 14820
}

Failed:

{
  "id": "67977425-...",
  "object": "job",
  "status": "failed",
  "endpoint": "/v1/messages",
  "completed_at": 1777262226887,
  "error": {
    "type": "job_failed",
    "message": "upstream 429: rate limited"
  }
}

Status transitions

queued  →  running  →  succeeded
                    \─→  failed

Typical wall-clock latencies:

queued → running: ~5–10 s (queue dispatch + cold consumer start)
running → succeeded: same as the synchronous call would have taken (mostly upstream model latency)

When to use jobs vs. synchronous

Use case	Recommended
Interactive chat UI	Synchronous (`/v1/messages`) or streaming threads
Batch processing N documents	Jobs
Long-context (>30 s of generation)	Jobs
Mobile / serverless client that can’t hold SSE open	Jobs
Background work triggered by a webhook	Jobs

Errors

Status	Meaning
400	Invalid `endpoint` value, missing `body`, or invalid JSON
401	Bad / revoked qlk key
402	Wallet exhausted OR per-key cap exceeded (pre-flight check)
404	`GET /v1/jobs/:id` — job not found OR not owned by caller

Limits (v1)

Streaming jobs (chunk persistence + retrieval) is a separate phase. For now, jobs always force stream: false upstream and return the full response on GET.
No webhook-on-completion (poll for now). Coming later via Svix.
Job results are stored inline in D1 (1 MB row cap covers virtually every chat-completion response). Larger response bodies move to R2 in a future release.

Inference

Substrate

Account

POST /v1/jobs — Submit

Body

Response (202)

GET /v1/jobs/:id — Poll

Response shape varies by status

Status transitions

When to use jobs vs. synchronous

Errors

Limits (v1)

Inference

Substrate

Account

​POST /v1/jobs — Submit

​Body

​Response (202)

​GET /v1/jobs/:id — Poll

​Response shape varies by status

​Status transitions

​When to use jobs vs. synchronous

​Errors

​Limits (v1)

POST /v1/jobs — Submit

Body

Response (202)

GET /v1/jobs/:id — Poll

Response shape varies by status

Status transitions

When to use jobs vs. synchronous

Errors

Limits (v1)