Skip to main content
qlaud’s /v1/messages endpoint is a native passthrough for Anthropic upstreams. The body forwards verbatim — cache_control: ephemeral markers, image content blocks, and thinking blocks all preserved.

Python

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.qlaud.ai",
    api_key="qlk_live_...",
)

msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "hello"}],
)
print(msg.content[0].text)

Node

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  baseURL: 'https://api.qlaud.ai',
  apiKey: 'qlk_live_...',
});

const msg = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'hello' }],
});
console.log(msg.content);

Prompt cache (the headline feature)

cache_control markers are forwarded to Anthropic verbatim. Tag a system block as ephemeral once, save 75% input cost on every subsequent turn.
client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": LONG_SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "hello"}],
)
The first call writes the cache. Every subsequent call within ~5 minutes reads from cache at ~25% of the input price.

Cross-provider via Anthropic shape

You can pass non-Anthropic model slugs too — qlaud translates the body to the upstream’s shape automatically:
msg = client.messages.create(
    model="gpt-5.4",          # routed to OpenAI
    max_tokens=1024,
    messages=[{"role": "user", "content": "hello"}],
)
For pure Anthropic calls (Opus, Sonnet, Haiku) the request flows verbatim with no translation overhead.

Tool use

Native — same Anthropic shape. Tool definitions, tool_use content blocks, tool_result content blocks all forwarded.