Chat
Try this in Colab — 04_harness_chat
How do I start a chat session?
Use POST /api/memories/{id}/chat with an OpenAI-compatible messages array. The response is a Server-Sent Events (SSE) stream delivering LLM tokens, tool calls, and rich A2UI surfaces.
The chat endpoint turns any Areev context database into a conversational interface. The LLM automatically recalls relevant grains as context and can execute tool calls to search, add, or modify grains during the conversation. This is the primary way humans interact with AI memory through the Areev App UI — ask questions in natural language and the LLM retrieves, summarizes, and acts on stored knowledge.
The model and provider fields are optional. If omitted, Areev uses the LLM settings configured on the server. Supported providers include OpenAI, Anthropic, and Ollama. The server proxies requests to the LLM provider so API keys never reach the browser. By default, 10 context grains are assembled per request (configurable up to 100 via context_limit), and the LLM can execute multiple sequential tool rounds per turn.
import requests
response = requests.post(
"https://acme.areev.ai/api/memories/knowledge-base/chat",
json={
"messages": [
{"role": "user", "content": "What does John like?"}
]
},
stream=True
)
for line in response.iter_lines():
if line:
print(line.decode())
POST /api/memories/knowledge-base/chat
Content-Type: application/json
{
"messages": [
{"role": "user", "content": "What does John like?"}
],
"model": "gpt-4o",
"provider": "openai"
}
What SSE events does the stream return?
The SSE stream delivers six event types that let frontends render text, tool activity, pipeline progress, and rich A2UI surfaces in real time.
Each event arrives as a data: line in the SSE stream with a JSON payload. The text event carries individual LLM tokens for streaming display (concatenate delta values to form the full message). The tool_call event fires when the LLM decides to execute a memory operation, followed by a tool_result event when execution completes. The pipeline event reports per-stage progress for multi-stage tools (e.g. CAL query, recall, compliance checks), carrying the stage name, status, optional duration, and stage-specific data. The a2ui event delivers structured UI components (memory cards, dashboards, compliance summaries) in A2UI v0.9 JSONL format. The done event signals stream completion with metadata including latency and context grain count.
Frontends should handle all six event types for a complete experience. The a2ui surfaces are optional — if your client does not support rich rendering, the text events alone carry the full conversational response.
| Event | Payload | When |
|---|---|---|
text | {"delta": "..."} | LLM generates a text token (deltas concatenate to the full message) |
tool_call | {"id", "name"} | LLM requests a memory operation (arguments arrive in the matching tool_result) |
tool_result | {"id", "name", "content"?, "error"?, "is_error"?} | Tool execution completes |
pipeline | {"pipeline_id", "stage", "status", "duration_ms"?, "data"?, "error"?} | Multi-stage tool reports stage start, progress, or completion |
a2ui | A2UI v0.9 JSONL | Tool returns a rich UI surface |
done | {"conversation_id", "model", "provider", "latency_ms", "context_grains", "input_tokens", "output_tokens", "usage": {"input_tokens", "output_tokens", "cost_usd"}, "thread_id"?, "thread_title"?} | LLM stream complete. Token + cost telemetry is always present. thread_id is included when the request carried one; thread_title is included only once the engine has generated a title for that thread (subsequent turns omit it). |
When a request supplies a user_action (e.g. view_detail, forget_grain) the engine bypasses the LLM entirely. In that path the done payload collapses to a single field — {"action": "<action-name>"} — and the token-telemetry fields above are absent. Clients should branch on whether action is present rather than assuming every done event carries usage data.
Terminal failures surface inside tool_result (is_error: true, error field) or inside pipeline (status: "error", error field). There is no top-level error event — clients should watch those two channels for failure signal.
What tools can the LLM use?
The chat engine provides 9 built-in tools that let the LLM read, write, and inspect the autonomous memory during conversation. (Source readers may notice a tenth MemoryRecallTool impl in src/chat/tools.rs — it is intentionally not registered; cal_query is the single search entry point and falls back to recall internally when CAL parsing fails.)
The LLM selects tools based on conversational context. When a user asks “what does John like?”, the LLM calls cal_query (which handles all search routing and falls back to recall internally when CAL parsing fails). When a user says “remember that john likes coffee”, the LLM calls memory_add. Tool execution goes through the same code path as direct API calls, so all policy checks, audit logging, and authorization rules apply.
These tools give the LLM full CRUD access to the context database within the constraints of the active policy. If a compliance policy restricts certain operations, the tool call fails gracefully and the LLM reports the restriction to the user.
| Tool | Action | Returns |
|---|---|---|
cal_query | Run a CAL search; falls back to recall on parse failure | Memory cards with scores |
memory_add | Store a new belief grain | Confirmation with hash |
memory_get | Retrieve a grain by hash | Detailed grain view |
memory_forget | Delete a grain by hash | Confirmation |
memory_supersede | Replace a grain with updated data | Confirmation with new hash |
memory_stats | Show database statistics | Stats dashboard |
memory_verify | Run compliance verification | Pass/fail summary |
memory_detect_pii | Scan text for PII | Detection results |
visualize | Render a knowledge graph or chart of recent grains | A2UI surface |
How does multi-turn conversation work?
Areev supports two conversation modes: persistent threads and ephemeral conversations.
Persistent threads are the primary mode used by the Areev App UI. Create a thread with POST /api/memories/{id}/chat/threads, then pass its thread_id in chat requests. Messages are stored in the thread and survive server restarts. You can list threads with GET /api/memories/{id}/chat/threads, view messages with GET /api/memories/{id}/chat/threads/{thread_id}/messages, rename threads, and delete them.
Ephemeral conversations work by sending the full conversation history in the messages array without a thread_id. The engine assembles context from recalled memories, active policies, database statistics, and the conversation history into a system prompt for each turn. Client applications should maintain the message array and append each new user message and assistant response before sending the next request.
The conversation_id in the done event can be used to correlate turns across either mode.
POST /api/memories/knowledge-base/chat
Content-Type: application/json
{
"messages": [
{"role": "user", "content": "What does John like?"},
{"role": "assistant", "content": "John likes coffee."},
{"role": "user", "content": "Has that changed recently?"}
]
}
What are the security constraints?
Chat is blocked entirely when HIPAA policy is active. LLM API keys are stored in encrypted server-side settings and never exposed to the client.
HIPAA compliance requires that protected health information never transit to third-party LLM providers, so Areev disables the chat endpoint when HIPAA policy is in effect. PII detection, when enabled, scans user messages before they are sent to the LLM. All tool executions thread the AuthIdentity through normal audit trail and policy checks.
| Constraint | Behavior |
|---|---|
| HIPAA policy active | Chat returns a policy violation error |
| PII detection enabled | User messages are scanned before sending to LLM |
| Auth required | AuthIdentity is threaded through all tool executions |
| Conversation storage | Persistent via threads, or ephemeral via message array |
| Tool execution | All tools go through normal audit trail and policy checks |
Related
- Search — direct search without LLM
- CAL — declarative query language
- Add and Query — the operations chat tools use