/v1/search

Every assistant + user turn from your threads is automatically embedded (OpenAI text-embedding-3-large at 1536 dims) and indexed in Cloudflare Vectorize. You query with plain text; we embed it and run k-NN with metadata-filtered tenant isolation. No vector store to provision, no embedding pipeline to maintain. Search becomes available within a few seconds of each thread message persisting.

GET /v1/threads/:id/search — Within one thread

curl -G 'https://api.qlaud.ai/v1/threads/$THREAD_ID/search' \
  --data-urlencode 'q=what did we discuss about Rust' \
  --data-urlencode 'limit=10' \
  -H "x-api-key: $QLAUD_API_KEY"

GET /v1/search — Across every thread you own

curl -G 'https://api.qlaud.ai/v1/search' \
  --data-urlencode 'q=refund policy questions' \
  --data-urlencode 'end_user_id=user_42' \
  --data-urlencode 'limit=10' \
  -H "x-api-key: $QLAUD_API_KEY"

Query

Param	Default	Description
`q`	required	Natural-language search string. We embed it and find the closest message turns.
`limit`	`10` (max `50`)	Top-K hits to return.
`end_user_id`	—	Narrow to one of your end-users (the value passed at thread create time). Account-wide search only.

Response

{
  "object": "list",
  "query": "what did we discuss about Rust",
  "data": [
    {
      "thread_id": "39e6f5ac-...",
      "seq": 4,
      "role": "assistant",
      "score": 0.732,
      "snippet": "Rust's ownership model is a compile-time memory management system…",
      "created_at": 1777262998000
    },
    {
      "thread_id": "39e6f5ac-...",
      "seq": 3,
      "role": "user",
      "score": 0.666,
      "snippet": "Now explain Rust's ownership model in one sentence.",
      "created_at": 1777262995000
    }
  ]
}

Score

Cosine similarity, range -1.0 to 1.0. Roughly:

Score	Meaning
> 0.6	Strong match — same topic
0.3 – 0.6	Related
0.05 – 0.3	Weakly related
< 0	Anti-correlated (different topic)

No threshold filtering applied — topK returns the closest N regardless of score so you can render score badges in your UI.

Snippets

snippet is a ~240-char excerpt centered on the first matching token of your query. For short messages, the full content is returned.

What gets embedded

Every user turn (the content you sent).
Every FINAL assistant turn (the model’s text response).
Tool-loop intermediate turns (the tool_use + tool_result blocks inside a multi-turn dispatch) are NOT embedded — they’re internal state and would pollute search with non-prose noise.
Image/file/tool blocks within content are stripped at embed time; only text blocks contribute.

Tenant isolation

Every embedding is tagged with:

user_id (your qlaud account)
thread_id
seq, role, created_at
end_user_id if the thread was tagged with one at create time

Vectorize metadata filter scopes queries to the caller’s user_id always; optionally narrows by thread_id and/or end_user_id. Other qlaud customers’ data is never visible.

Errors

Status	Meaning
400	Missing `q`
401	Bad / revoked qlk key
404	Thread not found OR not owned by caller (per-thread search only)

Limits (v1)

Embedding takes a few seconds after a turn persists; very recent turns may not be searchable yet.
Newly-tagged end_user_id filter requires the Vectorize metadata index — already created at the qlaud account level, no setup on your end.
Cross-account search is impossible by design.

Inference

Substrate

Account

GET /v1/threads/:id/search — Within one thread

GET /v1/search — Across every thread you own

Query

Response

Score

Snippets

What gets embedded

Tenant isolation

Errors

Limits (v1)

Inference

Substrate

Account

​GET /v1/threads/:id/search — Within one thread

​GET /v1/search — Across every thread you own

​Query

​Response

​Score

​Snippets

​What gets embedded

​Tenant isolation

​Errors

​Limits (v1)

GET /v1/threads/:id/search — Within one thread

GET /v1/search — Across every thread you own

Query

Response

Score

Snippets

What gets embedded

Tenant isolation

Errors

Limits (v1)