Provenance

What does a provenance record contain?

Each provenance record captures the complete decision trace of a single recall() invocation, storing enough information to explain why specific memories were returned and others were excluded without storing raw grain content. This context database uses provenance records to make AI memory decisions auditable and reproducible.

The record includes a recall_id (SHA-256 content address computed over all fields), timestamp, namespace, query parameters (with flags for text query, embedding, grain type, SPO blinding, contradiction detection), returned grain hashes with score breakdowns (BM25 rank, vector score, RRF fusion score, recency decay, interference penalty, final score), excluded candidates with reasons (capped at 200 entries), candidate count, result count, and audit_entry_hash linking to the corresponding MemoryRecalled audit event. The autonomous memory engine makes each record self-verifying through its content-addressed ID.

import requests

# Retrieve a provenance record
resp = requests.get("http://localhost:4009/api/memories/default/provenance/a1b2c3d4")
record = resp.json()
# Score breakdowns: bm25_rank, vector_score, rrf_score, recency_decay, final_score
GET /api/memories/default/provenance/a1b2c3d4 HTTP/1.1
Host: localhost:4009
areev provenance --recall-id a1b2c3d4

How does Areev track exclusion reasons?

Areev records why each candidate grain was excluded from recall results, mapping to the filtering stages in the recall pipeline. This AI agent memory system supports EU AI Act Art. 86 explanations by providing transparent decision rationale for every excluded grain.

The 21 exclusion types cover regulatory filters (ProcessingRestricted per GDPR Art. 18, TtlExpired), quality filters (BelowConfidence, BelowMinScore, BelowImportance), semantic filters (Contradicted, ContradictedFiltered, Superseded, ConflictResolved, SupersessionDemoted, Deduplicated), structural filters (TagMismatch, TypeMismatch, NamespaceMismatch, UserIdMismatch, SubjectMismatch, ObjectMismatch, OutsideTimeRange, NamespaceCapped), and ranking filters (DiversityFiltered, BeyondLimit). Exclusion lists are capped at 200 entries per record to prevent unbounded growth, with an exclusions_truncated flag and total_exclusion_count indicating when the cap is reached.

ReasonDescription
ProcessingRestrictedUser’s processing restricted (GDPR Art. 18)
SupersededReplaced by a newer version
BelowConfidenceBelow query confidence threshold
BelowMinScoreBelow minimum score threshold
BelowImportanceBelow importance threshold
ContradictedPenalized as a contradiction
ContradictedFilteredNon-preferred side of contradiction pair
TagMismatchDoes not match required tag filters
TypeMismatchGrain type does not match
NamespaceMismatchNamespace does not match
UserIdMismatchUser ID does not match
SubjectMismatchSubject does not contain required substring
ObjectMismatchObject does not contain required substring
OutsideTimeRangeFalls outside requested time range
TtlExpiredExceeded policy TTL ceiling
DiversityFilteredRemoved by diversity reranking
BeyondLimitBeyond the result limit
DeduplicatedNear-duplicate of canonical grain
ConflictResolvedNewer grain with same (subject, relation) preferred
SupersessionDemotedDemoted by supersession-aware scoring
NamespaceCappedNamespace capped by max_namespaces

How does Areev protect provenance privacy?

Provenance records are designed for data minimization — they never contain raw grain content. When encryption is active, records are AES-256-GCM encrypted at rest with an HKDF-derived key, and query parameter fields (subject, relation, object, user_id) are stored as HMAC-SHA256 blind tokens. This context database ensures that even the search terms in the AI memory decision log cannot be read in plaintext.

The encryption key for provenance records derives from the master key via HKDF("areev-provenance-key"), separate from user DEKs. The blind key for SPO fields uses a dedicated derivation path, preventing cross-domain key reuse. Query text is excluded from records (it may contain PII), and embedding vectors are excluded (large and not useful for explanation). Actor IDs are pseudonymized via the audit trail’s HMAC mechanism.

Privacy protections:
  1. No raw grain content — only hashes and scores
  2. SPO fields -> HMAC-SHA256 blind tokens (when encrypted)
  3. Actor ID -> pseudonymized via audit trail HMAC
  4. Query text excluded (may contain PII)
  5. Embedding vectors excluded
  6. Records encrypted at rest with HKDF-derived key

Every provenance record contains the audit_entry_hash of its corresponding MemoryRecalled audit event, forming a cryptographic cross-reference. This autonomous memory system verifies both directions of the link to detect orphaned records.

The linking sequence runs as follows: recall() executes and builds a ProvenanceRecord, the audit trail appends a MemoryRecalled event, the audit entry hash is set on the record, and the record is stored. The verify_provenance_links() function checks both directions — every MemoryRecalled audit entry has a provenance record, and every provenance record has an audit entry. Records default to 180-day retention per EU AI Act Art. 19, configurable per namespace or globally. The cleanup_expired() method removes records beyond their retention period.

import requests

# Get provenance summary (link statistics)
resp = requests.get("http://localhost:4009/api/memories/default/provenance/summary")
result = resp.json()
GET /api/memories/default/provenance/summary HTTP/1.1
Host: localhost:4009

Note: Full provenance link verification (checking every provenance-audit cross-reference) is available via CLI with areev verify --provenance. The HTTP endpoint above returns a provenance summary; use the CLI for exhaustive link validation.

areev provenance --verify-links