Provenance
What does a provenance record contain?
Each provenance record captures the complete decision trace of a single recall() invocation, storing enough information to explain why specific memories were returned and others were excluded without storing raw grain content. This context database uses provenance records to make AI memory decisions auditable and reproducible.
The record includes a recall_id (SHA-256 content address computed over all fields), timestamp, namespace, query parameters (with flags for text query, embedding, grain type, SPO blinding, contradiction detection), returned grain hashes with score breakdowns (BM25 rank, vector score, RRF fusion score, recency decay, interference penalty, final score), excluded candidates with reasons (capped at 200 entries), candidate count, result count, and audit_entry_hash linking to the corresponding MemoryRecalled audit event. The autonomous memory engine makes each record self-verifying through its content-addressed ID.
import requests
# Retrieve a provenance record
resp = requests.get("http://localhost:4009/api/memories/default/provenance/a1b2c3d4")
record = resp.json()
# Score breakdowns: bm25_rank, vector_score, rrf_score, recency_decay, final_score
GET /api/memories/default/provenance/a1b2c3d4 HTTP/1.1
Host: localhost:4009
areev provenance --recall-id a1b2c3d4
How does Areev track exclusion reasons?
Areev records why each candidate grain was excluded from recall results, mapping to the filtering stages in the recall pipeline. This AI agent memory system supports EU AI Act Art. 86 explanations by providing transparent decision rationale for every excluded grain.
The 21 exclusion types cover regulatory filters (ProcessingRestricted per GDPR Art. 18, TtlExpired), quality filters (BelowConfidence, BelowMinScore, BelowImportance), semantic filters (Contradicted, ContradictedFiltered, Superseded, ConflictResolved, SupersessionDemoted, Deduplicated), structural filters (TagMismatch, TypeMismatch, NamespaceMismatch, UserIdMismatch, SubjectMismatch, ObjectMismatch, OutsideTimeRange, NamespaceCapped), and ranking filters (DiversityFiltered, BeyondLimit). Exclusion lists are capped at 200 entries per record to prevent unbounded growth, with an exclusions_truncated flag and total_exclusion_count indicating when the cap is reached.
| Reason | Description |
|---|---|
ProcessingRestricted | User’s processing restricted (GDPR Art. 18) |
Superseded | Replaced by a newer version |
BelowConfidence | Below query confidence threshold |
BelowMinScore | Below minimum score threshold |
BelowImportance | Below importance threshold |
Contradicted | Penalized as a contradiction |
ContradictedFiltered | Non-preferred side of contradiction pair |
TagMismatch | Does not match required tag filters |
TypeMismatch | Grain type does not match |
NamespaceMismatch | Namespace does not match |
UserIdMismatch | User ID does not match |
SubjectMismatch | Subject does not contain required substring |
ObjectMismatch | Object does not contain required substring |
OutsideTimeRange | Falls outside requested time range |
TtlExpired | Exceeded policy TTL ceiling |
DiversityFiltered | Removed by diversity reranking |
BeyondLimit | Beyond the result limit |
Deduplicated | Near-duplicate of canonical grain |
ConflictResolved | Newer grain with same (subject, relation) preferred |
SupersessionDemoted | Demoted by supersession-aware scoring |
NamespaceCapped | Namespace capped by max_namespaces |
How does Areev protect provenance privacy?
Provenance records are designed for data minimization — they never contain raw grain content. When encryption is active, records are AES-256-GCM encrypted at rest with an HKDF-derived key, and query parameter fields (subject, relation, object, user_id) are stored as HMAC-SHA256 blind tokens. This context database ensures that even the search terms in the AI memory decision log cannot be read in plaintext.
The encryption key for provenance records derives from the master key via HKDF("areev-provenance-key"), separate from user DEKs. The blind key for SPO fields uses a dedicated derivation path, preventing cross-domain key reuse. Query text is excluded from records (it may contain PII), and embedding vectors are excluded (large and not useful for explanation). Actor IDs are pseudonymized via the audit trail’s HMAC mechanism.
Privacy protections:
1. No raw grain content — only hashes and scores
2. SPO fields -> HMAC-SHA256 blind tokens (when encrypted)
3. Actor ID -> pseudonymized via audit trail HMAC
4. Query text excluded (may contain PII)
5. Embedding vectors excluded
6. Records encrypted at rest with HKDF-derived key
How does provenance link to the audit trail?
Every provenance record contains the audit_entry_hash of its corresponding MemoryRecalled audit event, forming a cryptographic cross-reference. This autonomous memory system verifies both directions of the link to detect orphaned records.
The linking sequence runs as follows: recall() executes and builds a ProvenanceRecord, the audit trail appends a MemoryRecalled event, the audit entry hash is set on the record, and the record is stored. The verify_provenance_links() function checks both directions — every MemoryRecalled audit entry has a provenance record, and every provenance record has an audit entry. Records default to 180-day retention per EU AI Act Art. 19, configurable per namespace or globally. The cleanup_expired() method removes records beyond their retention period.
import requests
# Get provenance summary (link statistics)
resp = requests.get("http://localhost:4009/api/memories/default/provenance/summary")
result = resp.json()
GET /api/memories/default/provenance/summary HTTP/1.1
Host: localhost:4009
Note: Full provenance link verification (checking every provenance-audit cross-reference) is available via CLI with
areev verify --provenance. The HTTP endpoint above returns a provenance summary; use the CLI for exhaustive link validation.
areev provenance --verify-links
Related
- EU AI Act: Art. 12 record-keeping and Art. 86 explanation
- Audit Trail: Hash-chained audit entries
- Encryption: Encryption of provenance records