/v1/messages or
/v1/chat/completions, but wrap it in /v1/jobs. qlaud returns a job id
immediately; the actual upstream call runs on a Cloudflare Queue consumer
and the response is retrievable via GET /v1/jobs/:id.
POST /v1/jobs — Submit
Body
| Field | Type | Required | Description |
|---|---|---|---|
endpoint | "/v1/messages" | "/v1/chat/completions" | yes | Which underlying endpoint to invoke. |
body | object | yes | The same JSON body you’d POST to that endpoint synchronously. stream: true is stripped — async is non-streaming-only for v1. |
Response (202)
GET /v1/jobs/:id — Poll
Response shape varies by status
Queued or running:Status transitions
queued→running: ~5–10 s (queue dispatch + cold consumer start)running→succeeded: same as the synchronous call would have taken (mostly upstream model latency)
When to use jobs vs. synchronous
| Use case | Recommended |
|---|---|
| Interactive chat UI | Synchronous (/v1/messages) or streaming threads |
| Batch processing N documents | Jobs |
| Long-context (>30 s of generation) | Jobs |
| Mobile / serverless client that can’t hold SSE open | Jobs |
| Background work triggered by a webhook | Jobs |
Errors
| Status | Meaning |
|---|---|
| 400 | Invalid endpoint value, missing body, or invalid JSON |
| 401 | Bad / revoked qlk key |
| 402 | Wallet exhausted OR per-key cap exceeded (pre-flight check) |
| 404 | GET /v1/jobs/:id — job not found OR not owned by caller |
Limits (v1)
- Streaming jobs (chunk persistence + retrieval) is a separate phase. For
now, jobs always force
stream: falseupstream and return the full response onGET. - No webhook-on-completion (poll for now). Coming later via Svix.
- Job results are stored inline in D1 (1 MB row cap covers virtually every chat-completion response). Larger response bodies move to R2 in a future release.