The production problem
Agent systems recompute the same memory again and again.
Long tasks repeat system prompts, tool schemas, retrieval context, repo maps, policy
documents, and trace history. The result is avoidable prefill latency, GPU memory
pressure, and token spend that is difficult to explain outside the infrastructure team.
Repeated region
Failure mode
Runtime pressure
Evanitas action
System and policy prompts
Stable context charged every turn
Prefill latency
Promote to reusable prefix memory
Tool schemas and examples
Repeated schema injection
Input tokens and cache churn
Compile into cache-friendly layout
Retrieval and repo context
Minor changes invalidate reuse
KV memory pressure
Separate stable source from dynamic query
Agent trace history
Growing context window
Cost per completed task
Fold repeated traces into memory regions