Chat

How do I start a chat session?

Use POST /api/memories/{id}/chat with an OpenAI-compatible messages array. The response is a Server-Sent Events (SSE) stream delivering LLM tokens, tool calls, and rich A2UI surfaces.

The chat endpoint turns any Areev context database into a conversational interface. The LLM automatically recalls relevant grains as context and can execute tool calls to search, add, or modify grains during the conversation. This is the primary way humans interact with AI memory through the Areev App UI — ask questions in natural language and the LLM retrieves, summarizes, and acts on stored knowledge.

The model and provider fields are optional. If omitted, Areev uses the LLM settings configured on the server. Supported providers include OpenAI, Anthropic, and Ollama. The server proxies requests to the LLM provider so API keys never reach the browser. By default, 10 context grains are assembled per request (configurable up to 100 via context_limit), and the LLM can execute multiple sequential tool rounds per turn.

import requests

response = requests.post(
    "http://localhost:4009/api/memories/knowledge-base/chat",
    json={
        "messages": [
            {"role": "user", "content": "What does John like?"}
        ]
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode())
POST /api/memories/knowledge-base/chat
Content-Type: application/json

{
  "messages": [
    {"role": "user", "content": "What does John like?"}
  ],
  "model": "gpt-4o",
  "provider": "openai"
}

What SSE events does the stream return?

The SSE stream delivers six event types that let frontends render text, tool activity, and rich A2UI surfaces in real time.

Each event arrives as a data: line in the SSE stream with a JSON payload. The text event carries individual LLM tokens for streaming display. The tool_call event fires when the LLM decides to execute a memory operation, followed by a tool_result event when execution completes. The a2ui event delivers structured UI components (memory cards, dashboards, compliance summaries) in A2UI v0.9 JSONL format. The done event signals stream completion with metadata including latency and context grain count.

Frontends should handle all six event types for a complete experience. The a2ui surfaces are optional — if your client does not support rich rendering, the text events alone carry the full conversational response.

EventPayloadWhen
text{"content": "..."}LLM generates a text token
tool_call{"id", "name", "arguments"}LLM requests a memory operation
tool_result{"tool_call_id", "content", "is_error"}Tool execution completes
a2uiA2UI v0.9 JSONLTool returns a rich UI surface
done{"conversation_id", "model", "provider", "latency_ms", "context_grains"}Stream complete
error{"error": "..."}Error occurred

What tools can the LLM use?

The chat engine provides 8 built-in tools that let the LLM read, write, and inspect the autonomous memory during conversation.

The LLM selects tools based on conversational context. When a user asks “what does John like?”, the LLM calls memory_recall with subject="john". When a user says “remember that john likes coffee”, the LLM calls memory_add. Tool execution goes through the same code path as direct API calls, so all policy checks, audit logging, and authorization rules apply.

These tools give the LLM full CRUD access to the context database within the constraints of the active policy. If a compliance policy restricts certain operations, the tool call fails gracefully and the LLM reports the restriction to the user.

ToolActionReturns
memory_recallSearch memories by query, subject, or filtersMemory cards with scores
memory_addStore a new belief grainConfirmation with hash
memory_getRetrieve a grain by hashDetailed grain view
memory_forgetDelete a grain by hashConfirmation
memory_supersedeReplace a grain with updated dataConfirmation with new hash
memory_statsShow database statisticsStats dashboard
memory_verifyRun compliance verificationPass/fail summary
memory_detect_piiScan text for PIIDetection results

How does multi-turn conversation work?

Areev supports two conversation modes: persistent threads and ephemeral conversations.

Persistent threads are the primary mode used by the Areev App UI. Create a thread with POST /api/memories/{id}/chat/threads, then pass its thread_id in chat requests. Messages are stored in the thread and survive server restarts. You can list threads with GET /api/memories/{id}/chat/threads, view messages with GET /api/memories/{id}/chat/threads/{thread_id}/messages, rename threads, and delete them.

Ephemeral conversations work by sending the full conversation history in the messages array without a thread_id. The engine assembles context from recalled memories, active policies, database statistics, and the conversation history into a system prompt for each turn. Client applications should maintain the message array and append each new user message and assistant response before sending the next request.

The conversation_id in the done event can be used to correlate turns across either mode.

POST /api/memories/knowledge-base/chat
Content-Type: application/json

{
  "messages": [
    {"role": "user", "content": "What does John like?"},
    {"role": "assistant", "content": "John likes coffee."},
    {"role": "user", "content": "Has that changed recently?"}
  ]
}

What are the security constraints?

Chat is blocked entirely when HIPAA policy is active. LLM API keys are stored in encrypted server-side settings and never exposed to the client.

HIPAA compliance requires that protected health information never transit to third-party LLM providers, so Areev disables the chat endpoint when HIPAA policy is in effect. PII detection, when enabled, scans user messages before they are sent to the LLM. All tool executions thread the AuthIdentity through normal audit trail and policy checks.

ConstraintBehavior
HIPAA policy activeChat returns a policy violation error
PII detection enabledUser messages are scanned before sending to LLM
Auth requiredAuthIdentity is threaded through all tool executions
Conversation storagePersistent via threads, or ephemeral via message array
Tool executionAll tools go through normal audit trail and policy checks
  • Search — direct search without LLM
  • CAL — declarative query language
  • Add and Query — the operations chat tools use