TL;DR
| You want… | Use this |
|---|---|
| Just route to providers, bill per end-user | POST /v1/messages (or /v1/chat/completions) |
| Build a chatbot, agent, or anything with end-user identity | POST /v1/threads/:id/messages |
Side-by-side
/v1/messages | /v1/threads/:id/messages | |
|---|---|---|
| What it is | Anthropic-shape passthrough (also /v1/chat/completions for OpenAI-shape) | qlaud’s stateful Threads API |
| Conversation memory | You manage it. Pass the full messages array every turn. | Auto-loaded from prior turns. Just send the new user content. |
| Persistence | Nothing persisted (qlaud doesn’t store your messages array). | Each turn persisted in qlaud’s D1; deletable via API. |
| Tool dispatch | Forwarded as-is. You parse tool_use blocks and call your own tools. | Built-in dispatch loop runs end-to-end inside qlaud — you get the final assistant message. |
| Catalog connectors (105 vendors) | Not available — you’d have to register each as an explicit tool. | Auto-discovered via tools_mode: "dynamic" (the default). End-users authorize via hosted URL. |
| Semantic search | Not available. | GET /v1/search?q=… indexes every persisted message. |
| Per-user billing | Yes — keys + spend caps + usage rollup. | Yes — same. |
| Provider routing + fallback | Yes. | Yes. |
| Streaming | Yes (Anthropic SSE shape). | Yes (Anthropic SSE shape, with extra qlaud.tool_dispatch_* events multiplexed in). |
| Easiest migration | One-line URL swap from your existing Anthropic / OpenAI SDK. | New API surface — small code change in your chat route. |
When to use /v1/messages
- Your existing app already manages conversation history (Postgres, Supabase, in-memory, whatever).
- You don’t want a managed connector layer.
- You’re already shipping and just want billing + multi-provider routing without rewriting anything.
- One-line migration:
ANTHROPIC_BASE_URL=https://api.qlaud.ai.
When to use /v1/threads/:id/messages
- You’re building a chatbot, agent, or AI feature with end-user identity.
- You want to give the model access to Linear, GitHub, Notion, Stripe, ClickUp, and the other 100+ vendors in the catalog — without writing a per-vendor integration.
- You want conversation history without running a database for it.
- You want semantic search across past chats without running a vector index.
/v1/messages PLUS auto-managed
threads, 105 catalog connectors auto-discoverable per end-user,
semantic search, and the built-in tool dispatch loop. Send the
next user turn, qlaud handles the rest, returns the assistant
reply.
The tools_mode flag
/v1/threads/:id/messages accepts tools_mode in the body. It
controls whether the model gets the meta-tools (auto-discovery)
or only the explicit tools you list:
tools_mode | tools array | Behavior |
|---|---|---|
"dynamic" | (omit) | Default when no tools. 4 meta-tools injected. Model discovers + invokes anything in catalog + your registered tools. |
"explicit" | ["tool_id_1", ...] | Default when tools array IS provided. Only those tool IDs visible. No meta-tools. |
"dynamic" | provided | Rejected with 400 — incompatible. |
"explicit" | (omit) | Empty toolset. |
Disabling specific catalog vendors
If you want to suppress Linear (or any catalog vendor) from your end-users’ discovery without disabling all of them:/v1/mcp-catalog/enable. List currently disabled via
GET /v1/mcp-catalog/disabled.
Bringing your own MCP server or webhook tool
Both surfaces support custom tools you register yourself. They appear alongside catalog tools in dynamic-mode discovery.- Custom MCP server:
POST /v1/mcp-serverswithserver_url+ optionalauth_headers. Any HTTPS-reachable MCP server (your own, or a long-tail vendor not in our catalog). - Custom webhook tool:
POST /v1/toolswithwebhook_url+input_schema. qlaud HMAC-signs the dispatch; your endpoint returns the result.
Migration shape
Common questions
Can I mix the two on the same wallet? Yes. They share auth, billing, and key scopes. Pick per-route. If I switch from/v1/messages to Threads, what migrates? Nothing
historical — your prior conversations live in your existing DB.
New conversations start fresh in qlaud.
Does the model see prior messages on Threads? Yes —
qlaud auto-loads them. Long threads use a sliding-window strategy;
override by passing an explicit messages array if you need control.