The AI industry has spent the past two years building better language models, better retrieval systems and better agent frameworks.
Yet every production AI application still rebuilds the same infrastructure. Every engineering team writes custom code to decide:
These decisions are currently embedded in application code. They should be infrastructure.
Context Runtime is a provider-agnostic optimization layer that automatically determines the optimal context execution plan before any model is called.
Applications describe intent. The runtime determines the initial execution strategy. Over time, it continuously improves those decisions using measured execution outcomes. Cost-based planning gets you the first execution; learning gets you the thousandth.
Current AI systems are built around static pipelines — User → Prompt → RAG → Model → Answer. Every serious application eventually extends that pipeline: hybrid search, BM25, GraphRAG, conversation memory, prompt caching, summarization, agent routing, verification, model selection, policy enforcement.
Every feature introduces another branch of application logic. As systems become more capable, developers spend less time building products and more time maintaining context pipelines. This is rapidly becoming the dominant engineering cost of production AI.
Today's leading AI companies optimize different parts of the stack:
These approaches are complementary. None provides a unified runtime responsible for deciding how context should flow through an AI system — and learning to route it better over time.
Context management should become an operating-system service rather than application code. Applications should no longer decide retrieval strategy, chunk size, reranking, prompt assembly, model routing, verification, compression or caching.
Applications should express intent. The runtime should determine the optimal execution plan — and refine that plan as it observes what actually works.
This is exactly what modern database query planners did for relational databases. Developers write SQL; the database determines how to execute it. Context Runtime applies the same principle to AI — and, like adaptive query optimizers, it uses execution feedback and statistics to get better with use.
Instead of manually assembling prompts, applications submit goals. The runtime evaluates multiple strategies while respecting application constraints — latency, cost, security, token budget, verification requirements and provider capabilities. Only then does execution begin.
And execution is not the end of the loop. Each run produces a measured outcome — a reward — that updates the runtime's cost model and strategy selection, so the next plan for a similar intent is better than the last.
Context Runtime separates planning from execution. The planner produces an Execution Graph; a pluggable scheduler executes that graph. Foundation models and retrieval systems become interchangeable plugins, verification becomes a runtime policy — and the application remains unchanged.
The runtime emits a backend-independent Execution Graph. The reference implementation includes an in-process scheduler and defines interfaces for distributed execution engines such as Dagster, Ray and Spark — so the same plan can run locally today and on distributed backends tomorrow without changing the planner or the application.
Retrieval is not a single fixed strategy but a set of runtime primitives the planner selects and cost-models per request:
Today's AI applications are tightly coupled to providers and frameworks. Tomorrow's applications will depend on infrastructure that automatically determines:
Context Runtime becomes that infrastructure layer.
The project begins as an AGPL-licensed reference implementation. Its purpose is to validate the architecture and establish an open specification for context management. The initial implementation focuses on:
The goal is not another agent framework. The goal is defining a new infrastructure abstraction. Multiple implementations of that abstraction are possible; a Python reference implementation exists today.
As AI workloads become larger and increasingly distributed, context planning itself becomes a distributed-systems problem. The long-term roadmap covers distributed planning and execution across Kubernetes, Dagster, Spark, Ray and future execution backends. This enables:
The planner remains provider-independent while execution engines continue to evolve independently.
Every production AI system already performs context planning. Most simply do it manually. The industry has standardized foundation models. It has standardized tool calling. It has standardized retrieval. It has not standardized context optimization.
That missing layer represents an opportunity to define a new category of AI infrastructure.
Compilers removed the need to write assembly. Operating systems removed the need to manage hardware directly. Database query planners removed the need to handcraft execution strategies — and adaptive optimizers later removed the need to hand-tune them as data changed.
Context Runtime brings the same arc to AI. Applications describe intent. The runtime determines the optimal execution plan, and learns from every execution. Everything else becomes implementation.