The pitch
You shipped an AI app. You have N end-users hitting Claude / GPT / Sora through your backend. Now you need:- per-user usage tracking
- per-user spending caps
- monthly invoicing
- failed-payment handling
- one bill from one provider, not five
qlk_live_… key per user when they sign
up to your app. We meter every request to that key, enforce a hard cap, and
report per-user spend you can pipe straight into Stripe.
What you get
Per-user keys
POST /v1/keys returns a qlk_live_… you store with that user. Optional
max_spend_usd is enforced gateway-side on every request.Per-user usage
GET /v1/usage returns spend, requests, and tokens broken down by every
key you’ve minted. Pipe it into Stripe at month-end.Frontier models
Claude Opus 4.7, GPT-5.4, Sora 2, Eleven, Whisper, Deepgram, Perplexity —
all behind one key. No per-provider integration.
Anthropic + OpenAI shape
Native
/v1/messages AND /v1/chat/completions — drop-in for Claude Code,
Cursor, Cline, openai-py, LangChain, Vercel AI SDK.Who it’s for
If you’re building a product that wraps an AI model and sells it to end-users, qlaud removes the entire billing-infrastructure layer.- Building an AI writing tool? Mint a key per writer.
- Coding agent for teams? Mint a key per developer seat.
- Voice agent SaaS? Mint a key per phone number.
- Image-gen for designers? Mint a key per designer.
The 30-second demo
Beyond billing — the app substrate
Once your end-users are minted as keys, qlaud manages the rest of the AI app stack so you don’t have to:Threads
Conversation memory primitive. Send just the new turn — qlaud loads
history server-side, persists both sides, returns the assistant
response. Kills the
messages table.Tools
Register a webhook URL once. When the assistant emits
tool_use,
qlaud calls your endpoint, awaits the result, re-calls the model.
Cross-provider — same shape for Claude or GPT.Search
Every turn auto-embedded into Cloudflare Vectorize. Query with plain
text, get tenant-isolated semantic hits. No vector DB to provision.
Jobs
Async submit + polled retrieval for long-running batch work. Same
request body as the synchronous endpoints, wrapped in
/v1/jobs.