Skip to main content
For inference calls you don’t want to block on (long-context document processing, batch prompts, mobile clients that can’t hold an SSE stream), submit the same request body as you would to /v1/messages or /v1/chat/completions, but wrap it in /v1/jobs. qlaud returns a job id immediately; the actual upstream call runs on a Cloudflare Queue consumer and the response is retrievable via GET /v1/jobs/:id.

POST /v1/jobs — Submit

curl https://api.qlaud.ai/v1/jobs \
  -H "x-api-key: $QLAUD_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "endpoint": "/v1/messages",
    "body": {
      "model": "claude-sonnet-4-6",
      "max_tokens": 1024,
      "messages": [{"role":"user","content":"Summarize this 50-page doc..."}]
    }
  }'

Body

FieldTypeRequiredDescription
endpoint"/v1/messages" | "/v1/chat/completions"yesWhich underlying endpoint to invoke.
bodyobjectyesThe same JSON body you’d POST to that endpoint synchronously. stream: true is stripped — async is non-streaming-only for v1.

Response (202)

{
  "id": "67977425-94bb-45ac-8eaa-8379fb798296",
  "object": "job",
  "status": "queued",
  "endpoint": "/v1/messages",
  "created_at": 1777262218214
}

GET /v1/jobs/:id — Poll

curl https://api.qlaud.ai/v1/jobs/$JOB_ID \
  -H "x-api-key: $QLAUD_API_KEY"

Response shape varies by status

Queued or running:
{
  "id": "67977425-...",
  "object": "job",
  "status": "queued",
  "endpoint": "/v1/messages",
  "model": null,
  "created_at": 1777262218214,
  "started_at": null,
  "completed_at": null
}
Succeeded:
{
  "id": "67977425-...",
  "object": "job",
  "status": "succeeded",
  "endpoint": "/v1/messages",
  "model": "claude-sonnet-4-6",
  "created_at": 1777262218214,
  "started_at": 1777262225083,
  "completed_at": 1777262226887,
  "response": {
    "id": "msg_xxx",
    "type": "message",
    "role": "assistant",
    "content": [{"type": "text", "text": "Here's a summary…"}],
    "stop_reason": "end_turn",
    "usage": { "input_tokens": 12500, "output_tokens": 240 }
  },
  "usage": { "input_tokens": 12500, "output_tokens": 240 },
  "cost_micros": 14820
}
Failed:
{
  "id": "67977425-...",
  "object": "job",
  "status": "failed",
  "endpoint": "/v1/messages",
  "completed_at": 1777262226887,
  "error": {
    "type": "job_failed",
    "message": "upstream 429: rate limited"
  }
}

Status transitions

queued  →  running  →  succeeded
                    \─→  failed
Typical wall-clock latencies:
  • queuedrunning: ~5–10 s (queue dispatch + cold consumer start)
  • runningsucceeded: same as the synchronous call would have taken (mostly upstream model latency)

When to use jobs vs. synchronous

Use caseRecommended
Interactive chat UISynchronous (/v1/messages) or streaming threads
Batch processing N documentsJobs
Long-context (>30 s of generation)Jobs
Mobile / serverless client that can’t hold SSE openJobs
Background work triggered by a webhookJobs

Errors

StatusMeaning
400Invalid endpoint value, missing body, or invalid JSON
401Bad / revoked qlk key
402Wallet exhausted OR per-key cap exceeded (pre-flight check)
404GET /v1/jobs/:id — job not found OR not owned by caller

Limits (v1)

  • Streaming jobs (chunk persistence + retrieval) is a separate phase. For now, jobs always force stream: false upstream and return the full response on GET.
  • No webhook-on-completion (poll for now). Coming later via Svix.
  • Job results are stored inline in D1 (1 MB row cap covers virtually every chat-completion response). Larger response bodies move to R2 in a future release.