Chat
How do I start a chat session?
Use POST /api/memories/{id}/chat with an OpenAI-compatible messages array. The response is a Server-Sent Events (SSE) stream delivering LLM tokens, tool calls, and rich A2UI surfaces.
The chat endpoint turns any Areev context database into a conversational interface. The LLM automatically recalls relevant grains as context and can execute tool calls to search, add, or modify grains during the conversation. This is the primary way humans interact with AI memory through the Areev App UI — ask questions in natural language and the LLM retrieves, summarizes, and acts on stored knowledge.
The model and provider fields are optional. If omitted, Areev uses the LLM settings configured on the server. Supported providers include OpenAI, Anthropic, and Ollama. The server proxies requests to the LLM provider so API keys never reach the browser. By default, 10 context grains are assembled per request (configurable up to 100 via context_limit), and the LLM can execute multiple sequential tool rounds per turn.
import requests
response = requests.post(
"http://localhost:4009/api/memories/knowledge-base/chat",
json={
"messages": [
{"role": "user", "content": "What does John like?"}
]
},
stream=True
)
for line in response.iter_lines():
if line:
print(line.decode())
POST /api/memories/knowledge-base/chat
Content-Type: application/json
{
"messages": [
{"role": "user", "content": "What does John like?"}
],
"model": "gpt-4o",
"provider": "openai"
}
What SSE events does the stream return?
The SSE stream delivers six event types that let frontends render text, tool activity, and rich A2UI surfaces in real time.
Each event arrives as a data: line in the SSE stream with a JSON payload. The text event carries individual LLM tokens for streaming display. The tool_call event fires when the LLM decides to execute a memory operation, followed by a tool_result event when execution completes. The a2ui event delivers structured UI components (memory cards, dashboards, compliance summaries) in A2UI v0.9 JSONL format. The done event signals stream completion with metadata including latency and context grain count.
Frontends should handle all six event types for a complete experience. The a2ui surfaces are optional — if your client does not support rich rendering, the text events alone carry the full conversational response.
| Event | Payload | When |
|---|---|---|
text | {"content": "..."} | LLM generates a text token |
tool_call | {"id", "name", "arguments"} | LLM requests a memory operation |
tool_result | {"tool_call_id", "content", "is_error"} | Tool execution completes |
a2ui | A2UI v0.9 JSONL | Tool returns a rich UI surface |
done | {"conversation_id", "model", "provider", "latency_ms", "context_grains"} | Stream complete |
error | {"error": "..."} | Error occurred |
What tools can the LLM use?
The chat engine provides 8 built-in tools that let the LLM read, write, and inspect the autonomous memory during conversation.
The LLM selects tools based on conversational context. When a user asks “what does John like?”, the LLM calls memory_recall with subject="john". When a user says “remember that john likes coffee”, the LLM calls memory_add. Tool execution goes through the same code path as direct API calls, so all policy checks, audit logging, and authorization rules apply.
These tools give the LLM full CRUD access to the context database within the constraints of the active policy. If a compliance policy restricts certain operations, the tool call fails gracefully and the LLM reports the restriction to the user.
| Tool | Action | Returns |
|---|---|---|
memory_recall | Search memories by query, subject, or filters | Memory cards with scores |
memory_add | Store a new belief grain | Confirmation with hash |
memory_get | Retrieve a grain by hash | Detailed grain view |
memory_forget | Delete a grain by hash | Confirmation |
memory_supersede | Replace a grain with updated data | Confirmation with new hash |
memory_stats | Show database statistics | Stats dashboard |
memory_verify | Run compliance verification | Pass/fail summary |
memory_detect_pii | Scan text for PII | Detection results |
How does multi-turn conversation work?
Areev supports two conversation modes: persistent threads and ephemeral conversations.
Persistent threads are the primary mode used by the Areev App UI. Create a thread with POST /api/memories/{id}/chat/threads, then pass its thread_id in chat requests. Messages are stored in the thread and survive server restarts. You can list threads with GET /api/memories/{id}/chat/threads, view messages with GET /api/memories/{id}/chat/threads/{thread_id}/messages, rename threads, and delete them.
Ephemeral conversations work by sending the full conversation history in the messages array without a thread_id. The engine assembles context from recalled memories, active policies, database statistics, and the conversation history into a system prompt for each turn. Client applications should maintain the message array and append each new user message and assistant response before sending the next request.
The conversation_id in the done event can be used to correlate turns across either mode.
POST /api/memories/knowledge-base/chat
Content-Type: application/json
{
"messages": [
{"role": "user", "content": "What does John like?"},
{"role": "assistant", "content": "John likes coffee."},
{"role": "user", "content": "Has that changed recently?"}
]
}
What are the security constraints?
Chat is blocked entirely when HIPAA policy is active. LLM API keys are stored in encrypted server-side settings and never exposed to the client.
HIPAA compliance requires that protected health information never transit to third-party LLM providers, so Areev disables the chat endpoint when HIPAA policy is in effect. PII detection, when enabled, scans user messages before they are sent to the LLM. All tool executions thread the AuthIdentity through normal audit trail and policy checks.
| Constraint | Behavior |
|---|---|
| HIPAA policy active | Chat returns a policy violation error |
| PII detection enabled | User messages are scanned before sending to LLM |
| Auth required | AuthIdentity is threaded through all tool executions |
| Conversation storage | Persistent via threads, or ephemeral via message array |
| Tool execution | All tools go through normal audit trail and policy checks |
Related
- Search — direct search without LLM
- CAL — declarative query language
- Add and Query — the operations chat tools use