Skip to main content

The pitch

You shipped an AI app. You have N end-users hitting Claude / GPT / Sora through your backend. Now you need:
  • per-user usage tracking
  • per-user spending caps
  • monthly invoicing
  • failed-payment handling
  • one bill from one provider, not five
qlaud is the billing layer. Mint a qlk_live_… key per user when they sign up to your app. We meter every request to that key, enforce a hard cap, and report per-user spend you can pipe straight into Stripe.

What you get

Per-user keys

POST /v1/keys returns a qlk_live_… you store with that user. Optional max_spend_usd is enforced gateway-side on every request.

Per-user usage

GET /v1/usage returns spend, requests, and tokens broken down by every key you’ve minted. Pipe it into Stripe at month-end.

Frontier models

Claude Opus 4.7, GPT-5.4, Sora 2, Eleven, Whisper, Deepgram, Perplexity — all behind one key. No per-provider integration.

Anthropic + OpenAI shape

Native /v1/messages AND /v1/chat/completions — drop-in for Claude Code, Cursor, Cline, openai-py, LangChain, Vercel AI SDK.

Who it’s for

If you’re building a product that wraps an AI model and sells it to end-users, qlaud removes the entire billing-infrastructure layer.
  • Building an AI writing tool? Mint a key per writer.
  • Coding agent for teams? Mint a key per developer seat.
  • Voice agent SaaS? Mint a key per phone number.
  • Image-gen for designers? Mint a key per designer.
You write your app. We do the metering, capping, and per-user reporting.

The 30-second demo

# 1. Mint your master key in the dashboard, then:
export QLAUD_MASTER_KEY=qlk_live_...

# 2. Mint a key for user_42 with a $5 monthly cap
curl https://api.qlaud.ai/v1/keys \
  -H "x-api-key: $QLAUD_MASTER_KEY" \
  -H "content-type: application/json" \
  -d '{"name":"user_42","max_spend_usd":5}'
# → {"id":"...","secret":"qlk_live_abc...","scope":"standard","max_spend_usd":5}

# 3. user_42 makes requests with their qlk_live_abc... key. We enforce the cap.

# 4. End of month — pull spend per user
curl https://api.qlaud.ai/v1/usage -H "x-api-key: $QLAUD_MASTER_KEY"
# → {"by_key":[{"key_name":"user_42","cost_micros":2347000,...}], ...}
That’s it. Read the per-user billing quickstart for the full flow with Node + Python + Stripe wiring.

Beyond billing — the app substrate

Once your end-users are minted as keys, qlaud manages the rest of the AI app stack so you don’t have to:

Threads

Conversation memory primitive. Send just the new turn — qlaud loads history server-side, persists both sides, returns the assistant response. Kills the messages table.

Tools

Register a webhook URL once. When the assistant emits tool_use, qlaud calls your endpoint, awaits the result, re-calls the model. Cross-provider — same shape for Claude or GPT.

Search

Every turn auto-embedded into Cloudflare Vectorize. Query with plain text, get tenant-isolated semantic hits. No vector DB to provision.

Jobs

Async submit + polled retrieval for long-running batch work. Same request body as the synchronous endpoints, wrapped in /v1/jobs.
For the full picture — building a complete chat product (per-user threads, tools, search, streaming UX, billing) end-to-end — see the Build a chat app tutorial.