Provenance

What does a provenance record contain?

Each provenance record captures the complete decision trace of a single recall() invocation, storing enough information to explain why specific memories were returned and others were excluded without storing raw grain content. This context database uses provenance records to make AI memory decisions auditable and reproducible.

The record includes a recall_id (SHA-256 content address computed over all fields), timestamp, namespace, query parameters (with flags for text query, embedding, grain type, SPO blinding, contradiction detection), returned grain hashes with score breakdowns (BM25 rank, vector score, RRF fusion score, recency decay, interference penalty, final score), excluded candidates with reasons (capped at 200 entries), candidate count, result count, and audit_entry_hash linking to the corresponding MemoryRecalled audit event. The autonomous memory engine makes each record self-verifying through its content-addressed ID.

import requests

# Retrieve a provenance record
resp = requests.get("https://acme.areev.ai/api/memories/default/provenance/a1b2c3d4")
record = resp.json()
# Score breakdowns: bm25_rank, vector_score, rrf_score, recency_decay, final_score

GET /api/memories/default/provenance/a1b2c3d4 HTTP/1.1
Host: acme.areev.ai
Authorization: Bearer ar_...

How does Areev track exclusion reasons?

Areev records why each candidate grain was excluded from recall results, mapping to the filtering stages in the recall pipeline. This AI agent memory system supports EU AI Act Art. 86 explanations by providing transparent decision rationale for every excluded grain.

The 28 exclusion types cover regulatory filters (ProcessingRestricted per GDPR Art. 18, TtlExpired), quality filters (BelowConfidence, BelowMinScore, BelowImportance), semantic filters (Contradicted, ContradictedFiltered, Superseded, ConflictResolved, SupersessionDemoted, Deduplicated), structural filters (TagMismatch, TypeMismatch, NamespaceMismatch, UserIdMismatch, SubjectMismatch, ObjectMismatch, OutsideTimeRange, NamespaceCapped), planner-promoted post-filter mismatches (RelationMismatch, SubjectExactMismatch, ObjectExactMismatch, EntityMismatch, SubjectInMismatch, RelationInMismatch, ObjectInMismatch), and ranking filters (DiversityFiltered, BeyondLimit). Exclusion lists are capped at 200 entries per record to prevent unbounded growth, with an exclusions_truncated flag and total_exclusion_count indicating when the cap is reached.

Reason	Description
`ProcessingRestricted`	User’s processing restricted (GDPR Art. 18)
`Superseded`	Replaced by a newer version
`BelowConfidence`	Below query confidence threshold
`BelowMinScore`	Below minimum score threshold
`BelowImportance`	Below importance threshold
`Contradicted`	Penalized as a contradiction
`ContradictedFiltered`	Non-preferred side of contradiction pair
`TagMismatch`	Does not match required tag filters
`TypeMismatch`	Grain type does not match
`NamespaceMismatch`	Namespace does not match
`UserIdMismatch`	User ID does not match
`SubjectMismatch`	Subject does not contain required substring
`ObjectMismatch`	Object does not contain required substring
`OutsideTimeRange`	Falls outside requested time range
`TtlExpired`	Exceeded policy TTL ceiling
`DiversityFiltered`	Removed by diversity reranking
`BeyondLimit`	Beyond the result limit
`Deduplicated`	Near-duplicate of canonical grain
`ConflictResolved`	Newer grain with same (subject, relation) preferred
`SupersessionDemoted`	Demoted by supersession-aware scoring
`NamespaceCapped`	Namespace capped by max_namespaces
`RelationMismatch`	Planner promoted `relation` filter; grain’s relation does not match (QRY-E001 semantics)
`SubjectExactMismatch`	Planner promoted exact `subject` filter; grain’s subject does not match
`ObjectExactMismatch`	Planner promoted exact `object` filter; grain’s object does not match
`EntityMismatch`	Planner promoted `entity` filter; grain’s subject ≠ entity AND object ≠ entity
`SubjectInMismatch`	Planner promoted `subject_in` filter; grain’s subject not in allow-list
`RelationInMismatch`	Planner promoted `relation_in` filter; grain’s relation not in allow-list
`ObjectInMismatch`	Planner promoted `object_in` filter; grain’s object not in allow-list

How does Areev protect provenance privacy?

Provenance records are designed for data minimization — they never contain raw grain content. When encryption is active, records are AES-256-GCM encrypted at rest with an HKDF-derived key, and query parameter fields (subject, relation, object, user_id) are stored as HMAC-SHA256 blind tokens. This context database ensures that even the search terms in the AI memory decision log cannot be read in plaintext.

The encryption key for provenance records derives from the master key via HKDF("areev-provenance-key"), separate from user DEKs. The blind key for SPO fields uses a dedicated derivation path, preventing cross-domain key reuse. Query text is excluded from records (it may contain PII), and embedding vectors are excluded (large and not useful for explanation). Actor IDs are pseudonymized via the audit trail’s HMAC mechanism.

Privacy protections:
  1. No raw grain content — only hashes and scores
  2. SPO fields -> HMAC-SHA256 blind tokens (when encrypted)
  3. Actor ID -> pseudonymized via audit trail HMAC
  4. Query text excluded (may contain PII)
  5. Embedding vectors excluded
  6. Records encrypted at rest with HKDF-derived key

How does provenance link to the audit trail?

Every provenance record contains the audit_entry_hash of its corresponding MemoryRecalled audit event, forming a cryptographic cross-reference. This autonomous memory system verifies both directions of the link to detect orphaned records.

The linking sequence runs as follows: recall() executes and builds a ProvenanceRecord, the audit trail appends a MemoryRecalled event, the audit entry hash is set on the record, and the record is stored. The verify_provenance_links() function checks both directions — every MemoryRecalled audit entry has a provenance record, and every provenance record has an audit entry. Records default to 180-day retention per EU AI Act Art. 19, configurable per namespace or globally. The cleanup_expired() method removes records beyond their retention period.

import requests

# Get provenance summary (link statistics)
resp = requests.get("https://acme.areev.ai/api/memories/default/provenance/summary")
result = resp.json()

GET /api/memories/default/provenance/summary HTTP/1.1
Host: acme.areev.ai
Authorization: Bearer ar_...

Provenance-audit cross-references are verified as part of compliance verification — running POST /api/memories/{id}/verify/run checks that provenance records link correctly to their audit-trail entries.

EU AI Act: Art. 12 record-keeping and Art. 86 explanation
Audit Trail: Hash-chained audit entries
Encryption: Encryption of provenance records

Provenance

What does a provenance record contain?

How does Areev track exclusion reasons?

How does Areev protect provenance privacy?

How does provenance link to the audit trail?

Related