Chat

Try this in Colab — 04_harness_chat

How do I start a chat session?

Use POST /api/memories/{id}/chat/stream with an OpenAI-compatible messages array. The response is a Server-Sent Events (SSE) stream delivering LLM tokens, tool calls, and rich A2UI surfaces. (A sibling POST /api/memories/{id}/chat returns a single non-streaming JSON reply — use it only when you do not want SSE.)

The chat endpoint turns any Areev context database into a conversational interface. The LLM automatically recalls relevant grains as context and can execute tool calls to search, add, or modify grains during the conversation. This is the primary way humans interact with AI memory through the Areev App UI — ask questions in natural language and the LLM retrieves, summarizes, and acts on stored knowledge.

The model and provider fields are optional. If omitted, Areev uses the LLM settings configured on the server. Supported providers include OpenAI, Anthropic, and Ollama. The server proxies requests to the LLM provider so API keys never reach the browser. By default, 10 context grains are assembled per request (configurable up to 100 via context_limit), and the LLM can execute multiple sequential tool rounds per turn.

import requests

response = requests.post(
    "https://acme.areev.ai/api/memories/knowledge-base/chat/stream",
    json={
        "messages": [
            {"role": "user", "content": "What does John like?"}
        ]
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode())

POST /api/memories/knowledge-base/chat/stream
Content-Type: application/json

{
  "messages": [
    {"role": "user", "content": "What does John like?"}
  ],
  "model": "gpt-4o",
  "provider": "openai"
}

What SSE events does the stream return?

The SSE stream delivers six event types that let frontends render text, tool activity, pipeline progress, and rich A2UI surfaces in real time.

Each event arrives as a data: line in the SSE stream with a JSON payload. The text event carries individual LLM tokens for streaming display (concatenate delta values to form the full message). The tool_call event fires when the LLM decides to execute a memory operation, followed by a tool_result event when execution completes. The pipeline event reports per-stage progress for multi-stage tools (e.g. CAL query, recall, compliance checks), carrying the stage name, status, optional duration, and stage-specific data. The a2ui event delivers structured UI components (memory cards, dashboards, compliance summaries) in A2UI v0.9 JSONL format. The done event signals stream completion with metadata including latency and context grain count.

Frontends should handle all six event types for a complete experience. The a2ui surfaces are optional — if your client does not support rich rendering, the text events alone carry the full conversational response.

Event	Payload	When
`text`	`{"delta": "..."}`	LLM generates a text token (deltas concatenate to the full message)
`tool_call`	`{"id", "name"}`	LLM requests a memory operation (arguments arrive in the matching `tool_result`)
`tool_result`	`{"id", "name", "content"?, "error"?, "is_error"?}`	Tool execution completes
`pipeline`	`{"pipeline_id", "stage", "status", "duration_ms"?, "data"?, "error"?}`	Multi-stage tool reports stage start, progress, or completion
`a2ui`	A2UI v0.9 JSONL	Tool returns a rich UI surface
`done`	`{"conversation_id", "model", "provider", "latency_ms", "context_grains", "input_tokens", "output_tokens", "usage": {"input_tokens", "output_tokens", "cost_usd"}, "thread_id"?, "thread_title"?}`	LLM stream complete. Token + cost telemetry is always present. `thread_id` is included when the request carried one; `thread_title` is included only once the engine has generated a title for that thread (subsequent turns omit it).

When a request supplies a user_action (e.g. view_detail, forget_grain) the engine bypasses the LLM entirely. In that path the done payload collapses to a single field — {"action": "<action-name>"} — and the token-telemetry fields above are absent. Clients should branch on whether action is present rather than assuming every done event carries usage data.

Terminal failures surface inside tool_result (is_error: true, error field) or inside pipeline (status: "error", error field). There is no top-level error event — clients should watch those two channels for failure signal.

What tools can the LLM use?

The chat engine provides 9 core built-in tools that let the LLM read, write, and inspect the autonomous memory during conversation. When the CAL feature is enabled — which is the case on every shipping build — 8 additional saved-item tools register on top, bringing the total exposed to the LLM to 17. cal_query is the single search entry point and falls back to recall internally when CAL parsing fails.

The LLM selects tools based on conversational context. When a user asks “what does John like?”, the LLM calls cal_query (which handles all search routing and falls back to recall internally when CAL parsing fails). When a user says “remember that john likes coffee”, the LLM calls memory_add. Tool execution goes through the same code path as direct API calls, so all policy checks, audit logging, and authorization rules apply.

These tools give the LLM full CRUD access to the context database within the constraints of the active policy. If a compliance policy restricts certain operations, the tool call fails gracefully and the LLM reports the restriction to the user.

Tool	Action	Returns
`cal_query`	Run a CAL search; falls back to recall on parse failure	Memory cards with scores
`memory_add`	Store a new belief grain	Confirmation with hash
`memory_get`	Retrieve a grain by hash	Detailed grain view
`memory_forget`	Delete a grain by hash	Confirmation
`memory_supersede`	Replace a grain with updated data	Confirmation with new hash
`memory_stats`	Show database statistics	Stats dashboard
`memory_verify`	Run compliance verification	Pass/fail summary
`memory_detect_pii`	Scan text for PII	Detection results
`visualize`	Render a knowledge graph or chart of recent grains	A2UI surface

How does multi-turn conversation work?

Areev supports two conversation modes: persistent threads and ephemeral conversations.

Persistent threads are the primary mode used by the Areev App UI. Create a thread with POST /api/memories/{id}/chat/threads, then pass its thread_id in chat requests. Messages are stored in the thread and survive server restarts. You can list threads with GET /api/memories/{id}/chat/threads, view messages with GET /api/memories/{id}/chat/threads/{thread_id}/messages, rename threads, and delete them.

Ephemeral conversations work by sending the full conversation history in the messages array without a thread_id. The engine assembles context from recalled memories, active policies, database statistics, and the conversation history into a system prompt for each turn. Client applications should maintain the message array and append each new user message and assistant response before sending the next request.

The conversation_id in the done event can be used to correlate turns across either mode.

POST /api/memories/knowledge-base/chat/stream
Content-Type: application/json

{
  "messages": [
    {"role": "user", "content": "What does John like?"},
    {"role": "assistant", "content": "John likes coffee."},
    {"role": "user", "content": "Has that changed recently?"}
  ]
}

What are the security constraints?

Chat is blocked entirely when HIPAA policy is active. LLM API keys are stored in encrypted server-side settings and never exposed to the client.

HIPAA compliance requires that protected health information never transit to third-party LLM providers, so Areev disables the chat endpoint when HIPAA policy is in effect. PII detection, when enabled, scans user messages before they are sent to the LLM. All tool executions thread the AuthIdentity through normal audit trail and policy checks.

Constraint	Behavior
HIPAA policy active	Chat returns a policy violation error
PII detection enabled	User messages are scanned before sending to LLM
Auth required	`AuthIdentity` is threaded through all tool executions
Conversation storage	Persistent via threads, or ephemeral via message array
Tool execution	All tools go through normal audit trail and policy checks

Search — direct search without LLM
CAL — declarative query language
Add and Query — the operations chat tools use