Skip to main content
Threads are qlaud’s conversation primitive. Create a thread once, then send just the new user turn each call — qlaud loads the prior history, calls the upstream model, persists both turns, and returns the assistant message in standard Anthropic Messages shape. Kills the per-app messages table, the context-window loader, and the “how do I switch models mid-conversation” question — every endpoint that follows uses the same Anthropic shape regardless of underlying model.

POST /v1/threads — Create

curl https://api.qlaud.ai/v1/threads \
  -H "x-api-key: $QLAUD_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "end_user_id": "user_42",
    "metadata": {"plan": "pro", "feature": "/refunds"}
  }'

Body

FieldTypeRequiredDescription
end_user_idstringnoOpaque id for YOUR end-user (distinct from your qlaud account). Used to filter /v1/threads listings + /v1/search results.
metadataobjectnoArbitrary JSON. Stored verbatim, surfaced on read paths.

Response (201)

{
  "id": "2f1d0c7f-e2a1-40e4-8e21-182cf27deeb7",
  "object": "thread",
  "end_user_id": "user_42",
  "metadata": {"plan": "pro", "feature": "/refunds"},
  "created_at": 1777262997717,
  "last_active_at": 1777262997717
}

POST /v1/threads/:id/messages — Send a turn

The meat. Customer sends just the new user content; qlaud loads thread history, runs the upstream call, persists both turns, returns the assistant response.
curl https://api.qlaud.ai/v1/threads/$THREAD_ID/messages \
  -H "x-api-key: $QLAUD_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "content": "What did we just discuss?"
  }'

Body

Standard Anthropic Messages fields PLUS:
FieldTypeRequiredDescription
modelstringyesAny catalog model id.
max_tokensnumberyesResponse cap.
contentstring | content blocksyesThe NEW user turn (NOT a messages array).
streambooleannoWhen true, returns Anthropic SSE — including across tool-dispatch iterations. See Streaming below.
toolsstring[]noArray of registered tool IDs (see /v1/tools). qlaud handles the dispatch loop. Compatible with stream: true.
tools_mode"dynamic" | "explicit"noDefaults to "dynamic" when no tools array is passed (model gets 4 meta-tools, auto-discovers + dispatches anything in the catalog). "explicit" is the default when tools is provided.
system, tool_choice, temperature, top_p, stop_sequencesnoPassed through to upstream verbatim.

Response

Standard Anthropic Messages response with two extras attached:
{
  "id": "msg_xxx",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "..."}],
  "stop_reason": "end_turn",
  "usage": { "input_tokens": 12, "output_tokens": 18 },
  "thread_id": "2f1d0c7f-...",
  "seq": 4,
  "cost_micros": 465
}
When stream: true, the response is text/event-stream instead and the thread/seq attribution lands in headers:
content-type: text/event-stream
x-qlaud-thread-id: 2f1d0c7f-...
x-qlaud-assistant-seq: 4
Cross-shape works: pass model: "gpt-5.4" to a thread of Claude turns and qlaud translates transparently. The conversation history persists; the underlying model can change per request.

Streaming

stream: true returns an Anthropic-shape SSE stream of event: / data: pairs. Standard Anthropic events flow through verbatim:
  • message_start, message_delta, message_stop
  • content_block_start, content_block_delta, content_block_stop
  • ping
Tool dispatch is multiplexed inline. When the model calls a tool mid-stream, qlaud’s dispatch loop runs the tool (webhook / built-in / MCP / meta-tool) and streams the result back into the same SSE connection — your client never has to make a second HTTP call. We inject our own qlaud-prefixed events between the standard Anthropic events so your UI can render the running/done state of each tool inline:
EventWhen it firesPayload
qlaud.iteration_startEach loop iteration of the dispatch (one per model turn){ iteration: 1, request_id: "..." }
qlaud.tool_dispatch_startRight before a tool call is dispatched{ tool_use_id, name, input }
qlaud.tool_dispatch_doneAfter the tool completes (success or error){ tool_use_id, name, is_error: false, output }
qlaud.doneFinal event before the stream closes{ thread_id, seq, cost_micros, iterations }
Example stream for a “create a Linear ticket” turn (truncated):
event: message_start
data: {"type":"message_start","message":{"id":"msg_01...","model":"claude-sonnet-4-6",...}}

event: qlaud.iteration_start
data: {"iteration":1,"request_id":"req_xxx"}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll create that ticket."}}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_xxx","name":"qlaud_search_tools","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"intent\":\"linear ticket\"}"}}

event: qlaud.tool_dispatch_start
data: {"tool_use_id":"toolu_xxx","name":"qlaud_search_tools","input":{"intent":"linear ticket"}}

event: qlaud.tool_dispatch_done
data: {"tool_use_id":"toolu_xxx","name":"qlaud_search_tools","is_error":false,"output":{"results":[],"available_connectors":[{"vendor":"Linear","catalog_slug":"qlaud-mcp/linear",...}]}}

event: qlaud.iteration_start
data: {"iteration":2,"request_id":"req_xxx"}

... (model now calls qlaud_manage_connections.connect, then linear/create_issue, etc.)

event: message_stop
data: {"type":"message_stop"}

event: qlaud.done
data: {"thread_id":"2f1d0c7f-...","seq":4,"cost_micros":18996,"iterations":3}
Client parsing pattern: treat any event: starting with qlaud. as a side-channel UI signal (render a spinner, show “calling Linear…”, update a progress bar). All standard Anthropic events feed your existing message-renderer untouched. When stream: false (default), qlaud runs the same dispatch loop internally and only returns the final assistant message — same shape as Anthropic’s non-streaming Messages response, with the extra thread_id / seq / cost_micros fields attached.

All tool kinds work the same way in the stream

Whether the model calls a catalog MCP (Linear), a first-party builtin (web_search, code_execution, send_email), or a custom webhook you registered via POST /v1/tools, the dispatch flow is identical from the client’s perspective. Same qlaud.tool_dispatch_* events, same multiplexing, same single SSE connection. Your custom webhook gets POSTed by qlaud just like any other tool, the result is streamed back into the same SSE, and the model continues. For your custom tools to be auto-discoverable mid-conversation, send the request with no tools array — tools_mode defaults to "dynamic" and qlaud injects the 4 meta-tools. The model then calls qlaud_search_tools(intent: "...") and every tool you’ve registered appears in the results alongside catalog connectors (search queries the full tools table for your account — no kind filter). If you pin tools_mode: "explicit" and pass a specific tools: [...] array, only those tool IDs are visible. Use this when you want to restrict the surface (e.g. a dedicated “image-only” turn).

Streaming + tools — supported models

stream: true with the tool-dispatch loop runs through a unified SSE bridge: Anthropic-shape upstreams pass through verbatim, every OpenAI-shape upstream is translated to Anthropic-shape events on the fly. Your client always sees the same content_block_delta / tool_use event vocabulary regardless of which model you picked.
Model familystream: false + toolsstream: true + tools
Claude (claude-*)
OpenAI (gpt-*, o1-*, etc.)
DeepSeek (deepseek-*)
Mistral (mistral-*, codestral-*)
xAI Grok
Groq, Together, OpenRouter, Cerebras, Workers AI
Moonshot, Qwen / Alibaba, MiniMax
Google AI Studio (gemini-* via OpenAI-compat endpoint)
Vertex native (gemini-* via Vertex’s own SSE shape)🚧 small follow-up
ElevenLabs / Deepgram (audio)n/an/a
The qlaud.tool_dispatch_start / qlaud.tool_dispatch_done / qlaud.iteration_start events fire identically across all model families. Same client code, no per-provider branches.

GET /v1/threads — List

curl 'https://api.qlaud.ai/v1/threads?end_user_id=user_42&limit=20' \
  -H "x-api-key: $QLAUD_API_KEY"

Query

ParamDefaultDescription
limit20 (max 100)Page size.
end_user_idNarrow to one of your end-users.

Response

{
  "object": "list",
  "data": [
    {
      "id": "2f1d0c7f-...",
      "object": "thread",
      "end_user_id": "user_42",
      "metadata": {"plan": "pro"},
      "created_at": 1777262997717,
      "last_active_at": 1777263012890
    }
  ]
}

GET /v1/threads/:id — Get one

Returns the same shape as a list entry. 404 if you don’t own the thread or it’s been soft-deleted.

GET /v1/threads/:id/messages — List turns

Cursor-paginated. Supports both directions so chat UIs (latest-first, scroll up for older) and log replay UIs (oldest-first, scroll down) both fit cleanly.
# Default: oldest first.
curl 'https://api.qlaud.ai/v1/threads/$THREAD_ID/messages?limit=50' \
  -H "x-api-key: $QLAUD_API_KEY"

# Chat UI: latest 30 newest-first, then scroll-up for older.
curl 'https://api.qlaud.ai/v1/threads/$THREAD_ID/messages?order=desc&limit=30' \
  -H "x-api-key: $QLAUD_API_KEY"
# response includes next_before_seq → use as cursor for the next page:
curl 'https://api.qlaud.ai/v1/threads/$THREAD_ID/messages?order=desc&limit=30&before_seq=12' \
  -H "x-api-key: $QLAUD_API_KEY"

Query

ParamDefaultDescription
limit50 (max 200)Page size.
orderascasc (oldest first) or desc (newest first).
after_seqCursor for forward paging — return rows with seq > after_seq.
before_seqCursor for backward paging — return rows with seq < before_seq.
Pagination patterns:
  • Chat UI (newest first, scroll up for older):
    initial:  ?order=desc&limit=N
    next:     ?order=desc&limit=N&before_seq={response.next_before_seq}
    
  • Log replay (oldest first, scroll down for newer):
    initial:  ?order=asc&limit=N
    next:     ?order=asc&limit=N&after_seq={response.next_after_seq}
    
The response always includes both cursor fields; the unused one is null.

Response

{
  "object": "list",
  "data": [
    {
      "seq": 1,
      "role": "user",
      "content": "My name is Bob.",
      "request_id": null,
      "created_at": 1777262997900
    },
    {
      "seq": 2,
      "role": "assistant",
      "content": [{"type": "text", "text": "Got it, Bob!"}],
      "request_id": "msg_xxx",
      "created_at": 1777262998765
    }
  ],
  "has_more": false,
  "next_after_seq": 2,
  "next_before_seq": null
}

DELETE /v1/threads/:id — Soft delete

curl -X DELETE https://api.qlaud.ai/v1/threads/$THREAD_ID \
  -H "x-api-key: $QLAUD_API_KEY"
Soft delete — the row stays for audit, hard-delete cron sweeps later. Subsequent GETs return 404.

Errors

StatusMeaning
400Invalid body (missing model/max_tokens/content); tools_mode: "dynamic" AND a tools array passed together (use one or the other)
401Bad / revoked qlk key
402Wallet exhausted OR per-key cap exceeded
404Thread not found OR not owned by caller

Limits (v1)

  • History capped at last 50 turns when loading for the upstream call. Token-aware truncation comes later.
  • Streaming + tools combo not yet supported. Use one or the other.
  • Each turn embeds asynchronously into /v1/search — search becomes available within a few seconds of the turn persisting.