Ingest

Try this in Colab — 05_import_documents

How do I ingest a document?

Use POST /api/memories/{id}/import-document to import a file. Areev parses the document, splits it into text chunks, and stores each chunk as an Event grain.

The ingest pipeline turns unstructured files into searchable AI memory. Supported formats include PDF, DOCX, PPTX, HTML, and plain text (TXT). Each chunk is stored with metadata linking it back to the source document, including page numbers for PDFs and slide numbers for presentations. This means your context database retains provenance — you can trace any recalled grain back to its source page.

Ingested grains are immediately available for recall, search, and chat. The pipeline assigns a default confidence of 0.8 to all ingested grains, which you can override with the confidence parameter. Namespace and tags apply to every grain created in the batch, making it straightforward to organize ingested content alongside manually added grains.

POST /api/memories/knowledge-base/import-document
Content-Type: multipart/form-data

file: @report.pdf
config: {"namespace": "reports", "tags": ["q1-2026"], "chunk_size": 1000}

How do I control chunking?

Configure chunk_size and chunk_overlap to tune how documents split. Smaller chunks improve retrieval precision; larger chunks preserve more context per grain.

The chunker splits document text at character boundaries, respecting sentence and paragraph breaks where possible. Overlap ensures that information spanning a chunk boundary appears in both adjacent grains, preventing context loss at split points. For dense technical documents, a chunk size of 500 with 150 overlap works well. For narrative or conversational content, 1500—2000 with 200 overlap preserves more continuity.

Default parameters are chunk_size=1000, chunk_overlap=100, confidence=0.8, and namespace=default. You can also set user_id to attribute ingested grains to a specific actor in the audit trail.

Parameter	Default	Description
`chunk_size`	`1000`	Maximum characters per text chunk
`chunk_overlap`	`100`	Overlap characters between adjacent chunks
`confidence`	`0.8`	Confidence value assigned to all grains
`namespace`	`default`	Target namespace for ingested grains
`tags`	`[]`	Tags applied to every grain
`user_id`	—	User ID for audit logging

POST /api/memories/knowledge-base/import-document
Content-Type: multipart/form-data

file: @manual.pdf
config: {"chunk_size": 500, "chunk_overlap": 150, "confidence": 0.9}

How does staged ingest work?

The staged ingest flow uses a two-step process: extract text chunks from a document, review them, then write the ones you want as grains via batch-add.

Use POST /api/memories/{id}/extract-document to parse a document and return text chunks without writing any grains. The response includes the filename, format, total_chunks, and a chunks array with the extracted text. Review the chunks, then use POST /api/memories/{id}/batch-add to write the selected chunks as grains. This separation lets you inspect and filter content before it enters the context database.

This is the workflow the Areev App UI uses for document import with review. You control which chunks become grains and can adjust fields, tags, or namespaces before committing.

POST /api/memories/knowledge-base/extract-document
Content-Type: multipart/form-data

file: @report.pdf

{
  "filename": "report.pdf",
  "format": "pdf",
  "total_chunks": 12,
  "chunks": [
    {"text": "Chapter 1: Introduction...", "page": 1},
    {"text": "The system architecture...", "page": 2}
  ]
}

POST /api/memories/knowledge-base/batch-add
Content-Type: application/json

{
  "namespace": "reports",
  "tags": ["q1-2026"],
  "grains": [
    {"grain_type": "event", "fields": {"content": "Chapter 1: Introduction..."}},
    {"grain_type": "event", "fields": {"content": "The system architecture..."}}
  ]
}

How do I bulk-import raw grains?

Use POST /api/memories/{id}/batch-add to import structured grains directly without document parsing. This is the right approach for migrating data from another system or importing pre-processed content.

Batch-add accepts an array of typed grains with a shared namespace, tags, and optional source_filename. Each grain specifies its grain_type and fields object. The endpoint returns HTTP 201 if all grains succeed, HTTP 207 for partial success, and HTTP 400 if all fail. The response includes separate added and errors arrays with per-grain status.

This is distinct from document ingest because you control the grain structure entirely. There is no chunking, no LLM extraction — grains are written exactly as you provide them. Use this when you have already processed content into the AI agent memory grain format.

POST /api/memories/knowledge-base/batch-add
Content-Type: application/json

{
  "namespace": "imported",
  "source_filename": "export-2026-03.json",
  "tags": ["migration"],
  "grains": [
    {"grain_type": "belief", "fields": {"subject": "john", "relation": "works_at", "object": "Acme Corp"}},
    {"grain_type": "event", "fields": {"content": "john joined the team"}},
    {"grain_type": "observation", "fields": {"content": "john likes coffee in the morning"}}
  ]
}

Add and Query — adding individual grains
Managing Memories — creating memory instances
Hooks — react to ingested grains with webhooks

Ingest

How do I ingest a document?

How do I control chunking?

How does staged ingest work?

How do I bulk-import raw grains?

Related