Chat

Open In Colab Try this in Colab — 04_harness_chat

How do I start a chat session?

Use POST /api/memories/{id}/chat with an OpenAI-compatible messages array. The response is a Server-Sent Events (SSE) stream delivering LLM tokens, tool calls, and rich A2UI surfaces.

The chat endpoint turns any Areev context database into a conversational interface. The LLM automatically recalls relevant grains as context and can execute tool calls to search, add, or modify grains during the conversation. This is the primary way humans interact with AI memory through the Areev App UI — ask questions in natural language and the LLM retrieves, summarizes, and acts on stored knowledge.

The model and provider fields are optional. If omitted, Areev uses the LLM settings configured on the server. Supported providers include OpenAI, Anthropic, and Ollama. The server proxies requests to the LLM provider so API keys never reach the browser. By default, 10 context grains are assembled per request (configurable up to 100 via context_limit), and the LLM can execute multiple sequential tool rounds per turn.

import requests

response = requests.post(
    "https://acme.areev.ai/api/memories/knowledge-base/chat",
    json={
        "messages": [
            {"role": "user", "content": "What does John like?"}
        ]
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode())
POST /api/memories/knowledge-base/chat
Content-Type: application/json

{
  "messages": [
    {"role": "user", "content": "What does John like?"}
  ],
  "model": "gpt-4o",
  "provider": "openai"
}

What SSE events does the stream return?

The SSE stream delivers six event types that let frontends render text, tool activity, pipeline progress, and rich A2UI surfaces in real time.

Each event arrives as a data: line in the SSE stream with a JSON payload. The text event carries individual LLM tokens for streaming display (concatenate delta values to form the full message). The tool_call event fires when the LLM decides to execute a memory operation, followed by a tool_result event when execution completes. The pipeline event reports per-stage progress for multi-stage tools (e.g. CAL query, recall, compliance checks), carrying the stage name, status, optional duration, and stage-specific data. The a2ui event delivers structured UI components (memory cards, dashboards, compliance summaries) in A2UI v0.9 JSONL format. The done event signals stream completion with metadata including latency and context grain count.

Frontends should handle all six event types for a complete experience. The a2ui surfaces are optional — if your client does not support rich rendering, the text events alone carry the full conversational response.

EventPayloadWhen
text{"delta": "..."}LLM generates a text token (deltas concatenate to the full message)
tool_call{"id", "name"}LLM requests a memory operation (arguments arrive in the matching tool_result)
tool_result{"id", "name", "content"?, "error"?, "is_error"?}Tool execution completes
pipeline{"pipeline_id", "stage", "status", "duration_ms"?, "data"?, "error"?}Multi-stage tool reports stage start, progress, or completion
a2uiA2UI v0.9 JSONLTool returns a rich UI surface
done{"conversation_id", "model", "provider", "latency_ms", "context_grains", "input_tokens", "output_tokens", "usage": {"input_tokens", "output_tokens", "cost_usd"}, "thread_id"?, "thread_title"?}LLM stream complete. Token + cost telemetry is always present. thread_id is included when the request carried one; thread_title is included only once the engine has generated a title for that thread (subsequent turns omit it).

When a request supplies a user_action (e.g. view_detail, forget_grain) the engine bypasses the LLM entirely. In that path the done payload collapses to a single field — {"action": "<action-name>"} — and the token-telemetry fields above are absent. Clients should branch on whether action is present rather than assuming every done event carries usage data.

Terminal failures surface inside tool_result (is_error: true, error field) or inside pipeline (status: "error", error field). There is no top-level error event — clients should watch those two channels for failure signal.

What tools can the LLM use?

The chat engine provides 9 built-in tools that let the LLM read, write, and inspect the autonomous memory during conversation. (Source readers may notice a tenth MemoryRecallTool impl in src/chat/tools.rs — it is intentionally not registered; cal_query is the single search entry point and falls back to recall internally when CAL parsing fails.)

The LLM selects tools based on conversational context. When a user asks “what does John like?”, the LLM calls cal_query (which handles all search routing and falls back to recall internally when CAL parsing fails). When a user says “remember that john likes coffee”, the LLM calls memory_add. Tool execution goes through the same code path as direct API calls, so all policy checks, audit logging, and authorization rules apply.

These tools give the LLM full CRUD access to the context database within the constraints of the active policy. If a compliance policy restricts certain operations, the tool call fails gracefully and the LLM reports the restriction to the user.

ToolActionReturns
cal_queryRun a CAL search; falls back to recall on parse failureMemory cards with scores
memory_addStore a new belief grainConfirmation with hash
memory_getRetrieve a grain by hashDetailed grain view
memory_forgetDelete a grain by hashConfirmation
memory_supersedeReplace a grain with updated dataConfirmation with new hash
memory_statsShow database statisticsStats dashboard
memory_verifyRun compliance verificationPass/fail summary
memory_detect_piiScan text for PIIDetection results
visualizeRender a knowledge graph or chart of recent grainsA2UI surface

How does multi-turn conversation work?

Areev supports two conversation modes: persistent threads and ephemeral conversations.

Persistent threads are the primary mode used by the Areev App UI. Create a thread with POST /api/memories/{id}/chat/threads, then pass its thread_id in chat requests. Messages are stored in the thread and survive server restarts. You can list threads with GET /api/memories/{id}/chat/threads, view messages with GET /api/memories/{id}/chat/threads/{thread_id}/messages, rename threads, and delete them.

Ephemeral conversations work by sending the full conversation history in the messages array without a thread_id. The engine assembles context from recalled memories, active policies, database statistics, and the conversation history into a system prompt for each turn. Client applications should maintain the message array and append each new user message and assistant response before sending the next request.

The conversation_id in the done event can be used to correlate turns across either mode.

POST /api/memories/knowledge-base/chat
Content-Type: application/json

{
  "messages": [
    {"role": "user", "content": "What does John like?"},
    {"role": "assistant", "content": "John likes coffee."},
    {"role": "user", "content": "Has that changed recently?"}
  ]
}

What are the security constraints?

Chat is blocked entirely when HIPAA policy is active. LLM API keys are stored in encrypted server-side settings and never exposed to the client.

HIPAA compliance requires that protected health information never transit to third-party LLM providers, so Areev disables the chat endpoint when HIPAA policy is in effect. PII detection, when enabled, scans user messages before they are sent to the LLM. All tool executions thread the AuthIdentity through normal audit trail and policy checks.

ConstraintBehavior
HIPAA policy activeChat returns a policy violation error
PII detection enabledUser messages are scanned before sending to LLM
Auth requiredAuthIdentity is threaded through all tool executions
Conversation storagePersistent via threads, or ephemeral via message array
Tool executionAll tools go through normal audit trail and policy checks
  • Search — direct search without LLM
  • CAL — declarative query language
  • Add and Query — the operations chat tools use