Threads are qlaud’s conversation primitive. Create a thread once, then send
just the new user turn each call — qlaud loads the prior history, calls the
upstream model, persists both turns, and returns the assistant message in
standard Anthropic Messages shape.
Kills the per-app messages table, the context-window loader, and the
“how do I switch models mid-conversation” question — every endpoint that
follows uses the same Anthropic shape regardless of underlying model.
POST /v1/threads — Create
curl https://api.qlaud.ai/v1/threads \
-H "x-api-key: $QLAUD_API_KEY" \
-H "content-type: application/json" \
-d '{
"end_user_id": "user_42",
"metadata": {"plan": "pro", "feature": "/refunds"}
}'
Body
| Field | Type | Required | Description |
|---|
end_user_id | string | no | Opaque id for YOUR end-user (distinct from your qlaud account). Used to filter /v1/threads listings + /v1/search results. |
metadata | object | no | Arbitrary JSON. Stored verbatim, surfaced on read paths. |
Response (201)
{
"id": "2f1d0c7f-e2a1-40e4-8e21-182cf27deeb7",
"object": "thread",
"end_user_id": "user_42",
"metadata": {"plan": "pro", "feature": "/refunds"},
"created_at": 1777262997717,
"last_active_at": 1777262997717
}
POST /v1/threads/:id/messages — Send a turn
The meat. Customer sends just the new user content; qlaud loads thread
history, runs the upstream call, persists both turns, returns the assistant
response.
curl https://api.qlaud.ai/v1/threads/$THREAD_ID/messages \
-H "x-api-key: $QLAUD_API_KEY" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"content": "What did we just discuss?"
}'
Body
Standard Anthropic Messages fields PLUS:
| Field | Type | Required | Description |
|---|
model | string | yes | Any catalog model id. |
max_tokens | number | yes | Response cap. |
content | string | content blocks | yes | The NEW user turn (NOT a messages array). |
stream | boolean | no | When true, returns Anthropic SSE — including across tool-dispatch iterations. See Streaming below. |
tools | string[] | no | Array of registered tool IDs (see /v1/tools). qlaud handles the dispatch loop. Compatible with stream: true. |
tools_mode | "dynamic" | "explicit" | no | Defaults to "dynamic" when no tools array is passed (model gets 4 meta-tools, auto-discovers + dispatches anything in the catalog). "explicit" is the default when tools is provided. |
system, tool_choice, temperature, top_p, stop_sequences | — | no | Passed through to upstream verbatim. |
Response
Standard Anthropic Messages response with two extras attached:
{
"id": "msg_xxx",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "..."}],
"stop_reason": "end_turn",
"usage": { "input_tokens": 12, "output_tokens": 18 },
"thread_id": "2f1d0c7f-...",
"seq": 4,
"cost_micros": 465
}
When stream: true, the response is text/event-stream instead and the
thread/seq attribution lands in headers:
content-type: text/event-stream
x-qlaud-thread-id: 2f1d0c7f-...
x-qlaud-assistant-seq: 4
Cross-shape works: pass model: "gpt-5.4" to a thread of Claude turns and
qlaud translates transparently. The conversation history persists; the
underlying model can change per request.
Streaming
stream: true returns an Anthropic-shape SSE stream of event: /
data: pairs. Standard Anthropic events flow through verbatim:
message_start, message_delta, message_stop
content_block_start, content_block_delta, content_block_stop
ping
Tool dispatch is multiplexed inline. When the model calls a tool
mid-stream, qlaud’s dispatch loop runs the tool (webhook / built-in /
MCP / meta-tool) and streams the result back into the same SSE
connection — your client never has to make a second HTTP call. We
inject our own qlaud-prefixed events between the standard Anthropic
events so your UI can render the running/done state of each tool
inline:
| Event | When it fires | Payload |
|---|
qlaud.iteration_start | Each loop iteration of the dispatch (one per model turn) | { iteration: 1, request_id: "..." } |
qlaud.tool_dispatch_start | Right before a tool call is dispatched | { tool_use_id, name, input } |
qlaud.tool_dispatch_done | After the tool completes (success or error) | { tool_use_id, name, is_error: false, output } |
qlaud.done | Final event before the stream closes | { thread_id, seq, cost_micros, iterations } |
Example stream for a “create a Linear ticket” turn (truncated):
event: message_start
data: {"type":"message_start","message":{"id":"msg_01...","model":"claude-sonnet-4-6",...}}
event: qlaud.iteration_start
data: {"iteration":1,"request_id":"req_xxx"}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll create that ticket."}}
event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_xxx","name":"qlaud_search_tools","input":{}}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"intent\":\"linear ticket\"}"}}
event: qlaud.tool_dispatch_start
data: {"tool_use_id":"toolu_xxx","name":"qlaud_search_tools","input":{"intent":"linear ticket"}}
event: qlaud.tool_dispatch_done
data: {"tool_use_id":"toolu_xxx","name":"qlaud_search_tools","is_error":false,"output":{"results":[],"available_connectors":[{"vendor":"Linear","catalog_slug":"qlaud-mcp/linear",...}]}}
event: qlaud.iteration_start
data: {"iteration":2,"request_id":"req_xxx"}
... (model now calls qlaud_manage_connections.connect, then linear/create_issue, etc.)
event: message_stop
data: {"type":"message_stop"}
event: qlaud.done
data: {"thread_id":"2f1d0c7f-...","seq":4,"cost_micros":18996,"iterations":3}
Client parsing pattern: treat any event: starting with qlaud. as a
side-channel UI signal (render a spinner, show “calling Linear…”,
update a progress bar). All standard Anthropic events feed your
existing message-renderer untouched.
When stream: false (default), qlaud runs the same dispatch loop
internally and only returns the final assistant message — same
shape as Anthropic’s non-streaming Messages response, with the
extra thread_id / seq / cost_micros fields attached.
Whether the model calls a catalog MCP (Linear), a first-party
builtin (web_search, code_execution, send_email), or a
custom webhook you registered via
POST /v1/tools, the dispatch flow is
identical from the client’s perspective. Same qlaud.tool_dispatch_*
events, same multiplexing, same single SSE connection. Your custom
webhook gets POSTed by qlaud just like any other tool, the result is
streamed back into the same SSE, and the model continues.
For your custom tools to be auto-discoverable mid-conversation, send
the request with no tools array — tools_mode defaults to
"dynamic" and qlaud injects the 4 meta-tools. The model then calls
qlaud_search_tools(intent: "...") and every tool you’ve registered
appears in the results alongside catalog connectors (search queries
the full tools table for your account — no kind filter).
If you pin tools_mode: "explicit" and pass a specific tools: [...]
array, only those tool IDs are visible. Use this when you want to
restrict the surface (e.g. a dedicated “image-only” turn).
stream: true with the tool-dispatch loop runs through a unified
SSE bridge: Anthropic-shape upstreams pass through verbatim, every
OpenAI-shape upstream is translated to Anthropic-shape events on the
fly. Your client always sees the same content_block_delta /
tool_use event vocabulary regardless of which model you picked.
| Model family | stream: false + tools | stream: true + tools |
|---|
Claude (claude-*) | ✅ | ✅ |
OpenAI (gpt-*, o1-*, etc.) | ✅ | ✅ |
DeepSeek (deepseek-*) | ✅ | ✅ |
Mistral (mistral-*, codestral-*) | ✅ | ✅ |
| xAI Grok | ✅ | ✅ |
| Groq, Together, OpenRouter, Cerebras, Workers AI | ✅ | ✅ |
| Moonshot, Qwen / Alibaba, MiniMax | ✅ | ✅ |
Google AI Studio (gemini-* via OpenAI-compat endpoint) | ✅ | ✅ |
Vertex native (gemini-* via Vertex’s own SSE shape) | ✅ | 🚧 small follow-up |
| ElevenLabs / Deepgram (audio) | n/a | n/a |
The qlaud.tool_dispatch_start / qlaud.tool_dispatch_done /
qlaud.iteration_start events fire identically across all model
families. Same client code, no per-provider branches.
GET /v1/threads — List
curl 'https://api.qlaud.ai/v1/threads?end_user_id=user_42&limit=20' \
-H "x-api-key: $QLAUD_API_KEY"
Query
| Param | Default | Description |
|---|
limit | 20 (max 100) | Page size. |
end_user_id | — | Narrow to one of your end-users. |
Response
{
"object": "list",
"data": [
{
"id": "2f1d0c7f-...",
"object": "thread",
"end_user_id": "user_42",
"metadata": {"plan": "pro"},
"created_at": 1777262997717,
"last_active_at": 1777263012890
}
]
}
GET /v1/threads/:id — Get one
Returns the same shape as a list entry. 404 if you don’t own the thread or
it’s been soft-deleted.
GET /v1/threads/:id/messages — List turns
Cursor-paginated. Supports both directions so chat UIs (latest-first,
scroll up for older) and log replay UIs (oldest-first, scroll down)
both fit cleanly.
# Default: oldest first.
curl 'https://api.qlaud.ai/v1/threads/$THREAD_ID/messages?limit=50' \
-H "x-api-key: $QLAUD_API_KEY"
# Chat UI: latest 30 newest-first, then scroll-up for older.
curl 'https://api.qlaud.ai/v1/threads/$THREAD_ID/messages?order=desc&limit=30' \
-H "x-api-key: $QLAUD_API_KEY"
# response includes next_before_seq → use as cursor for the next page:
curl 'https://api.qlaud.ai/v1/threads/$THREAD_ID/messages?order=desc&limit=30&before_seq=12' \
-H "x-api-key: $QLAUD_API_KEY"
Query
| Param | Default | Description |
|---|
limit | 50 (max 200) | Page size. |
order | asc | asc (oldest first) or desc (newest first). |
after_seq | — | Cursor for forward paging — return rows with seq > after_seq. |
before_seq | — | Cursor for backward paging — return rows with seq < before_seq. |
Pagination patterns:
- Chat UI (newest first, scroll up for older):
initial: ?order=desc&limit=N
next: ?order=desc&limit=N&before_seq={response.next_before_seq}
- Log replay (oldest first, scroll down for newer):
initial: ?order=asc&limit=N
next: ?order=asc&limit=N&after_seq={response.next_after_seq}
The response always includes both cursor fields; the unused one is
null.
Response
{
"object": "list",
"data": [
{
"seq": 1,
"role": "user",
"content": "My name is Bob.",
"request_id": null,
"created_at": 1777262997900
},
{
"seq": 2,
"role": "assistant",
"content": [{"type": "text", "text": "Got it, Bob!"}],
"request_id": "msg_xxx",
"created_at": 1777262998765
}
],
"has_more": false,
"next_after_seq": 2,
"next_before_seq": null
}
DELETE /v1/threads/:id — Soft delete
curl -X DELETE https://api.qlaud.ai/v1/threads/$THREAD_ID \
-H "x-api-key: $QLAUD_API_KEY"
Soft delete — the row stays for audit, hard-delete cron sweeps later.
Subsequent GETs return 404.
Errors
| Status | Meaning |
|---|
| 400 | Invalid body (missing model/max_tokens/content); tools_mode: "dynamic" AND a tools array passed together (use one or the other) |
| 401 | Bad / revoked qlk key |
| 402 | Wallet exhausted OR per-key cap exceeded |
| 404 | Thread not found OR not owned by caller |
Limits (v1)
- History capped at last 50 turns when loading for the upstream call. Token-aware truncation comes later.
- Streaming +
tools combo not yet supported. Use one or the other.
- Each turn embeds asynchronously into /v1/search — search becomes available within a few seconds of the turn persisting.