Skip to main content
Every assistant + user turn from your threads is automatically embedded (OpenAI text-embedding-3-large at 1536 dims) and indexed in Cloudflare Vectorize. You query with plain text; we embed it and run k-NN with metadata-filtered tenant isolation. No vector store to provision, no embedding pipeline to maintain. Search becomes available within a few seconds of each thread message persisting.

GET /v1/threads/:id/search — Within one thread

curl -G 'https://api.qlaud.ai/v1/threads/$THREAD_ID/search' \
  --data-urlencode 'q=what did we discuss about Rust' \
  --data-urlencode 'limit=10' \
  -H "x-api-key: $QLAUD_API_KEY"

GET /v1/search — Across every thread you own

curl -G 'https://api.qlaud.ai/v1/search' \
  --data-urlencode 'q=refund policy questions' \
  --data-urlencode 'end_user_id=user_42' \
  --data-urlencode 'limit=10' \
  -H "x-api-key: $QLAUD_API_KEY"

Query

ParamDefaultDescription
qrequiredNatural-language search string. We embed it and find the closest message turns.
limit10 (max 50)Top-K hits to return.
end_user_idNarrow to one of your end-users (the value passed at thread create time). Account-wide search only.

Response

{
  "object": "list",
  "query": "what did we discuss about Rust",
  "data": [
    {
      "thread_id": "39e6f5ac-...",
      "seq": 4,
      "role": "assistant",
      "score": 0.732,
      "snippet": "Rust's ownership model is a compile-time memory management system…",
      "created_at": 1777262998000
    },
    {
      "thread_id": "39e6f5ac-...",
      "seq": 3,
      "role": "user",
      "score": 0.666,
      "snippet": "Now explain Rust's ownership model in one sentence.",
      "created_at": 1777262995000
    }
  ]
}

Score

Cosine similarity, range -1.0 to 1.0. Roughly:
ScoreMeaning
> 0.6Strong match — same topic
0.3 – 0.6Related
0.05 – 0.3Weakly related
< 0Anti-correlated (different topic)
No threshold filtering applied — topK returns the closest N regardless of score so you can render score badges in your UI.

Snippets

snippet is a ~240-char excerpt centered on the first matching token of your query. For short messages, the full content is returned.

What gets embedded

  • Every user turn (the content you sent).
  • Every FINAL assistant turn (the model’s text response).
  • Tool-loop intermediate turns (the tool_use + tool_result blocks inside a multi-turn dispatch) are NOT embedded — they’re internal state and would pollute search with non-prose noise.
  • Image/file/tool blocks within content are stripped at embed time; only text blocks contribute.

Tenant isolation

Every embedding is tagged with:
  • user_id (your qlaud account)
  • thread_id
  • seq, role, created_at
  • end_user_id if the thread was tagged with one at create time
Vectorize metadata filter scopes queries to the caller’s user_id always; optionally narrows by thread_id and/or end_user_id. Other qlaud customers’ data is never visible.

Errors

StatusMeaning
400Missing q
401Bad / revoked qlk key
404Thread not found OR not owned by caller (per-thread search only)

Limits (v1)

  • Embedding takes a few seconds after a turn persists; very recent turns may not be searchable yet.
  • Newly-tagged end_user_id filter requires the Vectorize metadata index — already created at the qlaud account level, no setup on your end.
  • Cross-account search is impossible by design.