Provenance
What does a provenance record contain?
Each provenance record captures the complete decision trace of a single recall() invocation, storing enough information to explain why specific memories were returned and others were excluded without storing raw grain content. This context database uses provenance records to make AI memory decisions auditable and reproducible.
The record includes a recall_id (SHA-256 content address computed over all fields), timestamp, namespace, query parameters (with flags for text query, embedding, grain type, SPO blinding, contradiction detection), returned grain hashes with score breakdowns (BM25 rank, vector score, RRF fusion score, recency decay, interference penalty, final score), excluded candidates with reasons (capped at 200 entries), candidate count, result count, and audit_entry_hash linking to the corresponding MemoryRecalled audit event. The autonomous memory engine makes each record self-verifying through its content-addressed ID.
import requests
# Retrieve a provenance record
resp = requests.get("https://acme.areev.ai/api/memories/default/provenance/a1b2c3d4")
record = resp.json()
# Score breakdowns: bm25_rank, vector_score, rrf_score, recency_decay, final_score
GET /api/memories/default/provenance/a1b2c3d4 HTTP/1.1
Host: acme.areev.ai
Authorization: Bearer ar_...
How does Areev track exclusion reasons?
Areev records why each candidate grain was excluded from recall results, mapping to the filtering stages in the recall pipeline. This AI agent memory system supports EU AI Act Art. 86 explanations by providing transparent decision rationale for every excluded grain.
The 28 exclusion types cover regulatory filters (ProcessingRestricted per GDPR Art. 18, TtlExpired), quality filters (BelowConfidence, BelowMinScore, BelowImportance), semantic filters (Contradicted, ContradictedFiltered, Superseded, ConflictResolved, SupersessionDemoted, Deduplicated), structural filters (TagMismatch, TypeMismatch, NamespaceMismatch, UserIdMismatch, SubjectMismatch, ObjectMismatch, OutsideTimeRange, NamespaceCapped), planner-promoted post-filter mismatches (RelationMismatch, SubjectExactMismatch, ObjectExactMismatch, EntityMismatch, SubjectInMismatch, RelationInMismatch, ObjectInMismatch), and ranking filters (DiversityFiltered, BeyondLimit). Exclusion lists are capped at 200 entries per record to prevent unbounded growth, with an exclusions_truncated flag and total_exclusion_count indicating when the cap is reached.
| Reason | Description |
|---|---|
ProcessingRestricted | User’s processing restricted (GDPR Art. 18) |
Superseded | Replaced by a newer version |
BelowConfidence | Below query confidence threshold |
BelowMinScore | Below minimum score threshold |
BelowImportance | Below importance threshold |
Contradicted | Penalized as a contradiction |
ContradictedFiltered | Non-preferred side of contradiction pair |
TagMismatch | Does not match required tag filters |
TypeMismatch | Grain type does not match |
NamespaceMismatch | Namespace does not match |
UserIdMismatch | User ID does not match |
SubjectMismatch | Subject does not contain required substring |
ObjectMismatch | Object does not contain required substring |
OutsideTimeRange | Falls outside requested time range |
TtlExpired | Exceeded policy TTL ceiling |
DiversityFiltered | Removed by diversity reranking |
BeyondLimit | Beyond the result limit |
Deduplicated | Near-duplicate of canonical grain |
ConflictResolved | Newer grain with same (subject, relation) preferred |
SupersessionDemoted | Demoted by supersession-aware scoring |
NamespaceCapped | Namespace capped by max_namespaces |
RelationMismatch | Planner promoted relation filter; grain’s relation does not match (QRY-E001 semantics) |
SubjectExactMismatch | Planner promoted exact subject filter; grain’s subject does not match |
ObjectExactMismatch | Planner promoted exact object filter; grain’s object does not match |
EntityMismatch | Planner promoted entity filter; grain’s subject ≠ entity AND object ≠ entity |
SubjectInMismatch | Planner promoted subject_in filter; grain’s subject not in allow-list |
RelationInMismatch | Planner promoted relation_in filter; grain’s relation not in allow-list |
ObjectInMismatch | Planner promoted object_in filter; grain’s object not in allow-list |
How does Areev protect provenance privacy?
Provenance records are designed for data minimization — they never contain raw grain content. When encryption is active, records are AES-256-GCM encrypted at rest with an HKDF-derived key, and query parameter fields (subject, relation, object, user_id) are stored as HMAC-SHA256 blind tokens. This context database ensures that even the search terms in the AI memory decision log cannot be read in plaintext.
The encryption key for provenance records derives from the master key via HKDF("areev-provenance-key"), separate from user DEKs. The blind key for SPO fields uses a dedicated derivation path, preventing cross-domain key reuse. Query text is excluded from records (it may contain PII), and embedding vectors are excluded (large and not useful for explanation). Actor IDs are pseudonymized via the audit trail’s HMAC mechanism.
Privacy protections:
1. No raw grain content — only hashes and scores
2. SPO fields -> HMAC-SHA256 blind tokens (when encrypted)
3. Actor ID -> pseudonymized via audit trail HMAC
4. Query text excluded (may contain PII)
5. Embedding vectors excluded
6. Records encrypted at rest with HKDF-derived key
How does provenance link to the audit trail?
Every provenance record contains the audit_entry_hash of its corresponding MemoryRecalled audit event, forming a cryptographic cross-reference. This autonomous memory system verifies both directions of the link to detect orphaned records.
The linking sequence runs as follows: recall() executes and builds a ProvenanceRecord, the audit trail appends a MemoryRecalled event, the audit entry hash is set on the record, and the record is stored. The verify_provenance_links() function checks both directions — every MemoryRecalled audit entry has a provenance record, and every provenance record has an audit entry. Records default to 180-day retention per EU AI Act Art. 19, configurable per namespace or globally. The cleanup_expired() method removes records beyond their retention period.
import requests
# Get provenance summary (link statistics)
resp = requests.get("https://acme.areev.ai/api/memories/default/provenance/summary")
result = resp.json()
GET /api/memories/default/provenance/summary HTTP/1.1
Host: acme.areev.ai
Authorization: Bearer ar_...
For exhaustive link verification (checking every provenance-audit cross-reference) call GET /api/memories/{id}/provenance/verify-links.
Related
- EU AI Act: Art. 12 record-keeping and Art. 86 explanation
- Audit Trail: Hash-chained audit entries
- Encryption: Encryption of provenance records