Private pitch · redevops.io

Context Runtime

A Cost-Based, Self-Optimizing Runtime for AI Context Management

Applications describe intent. The runtime determines the execution strategy — and improves it with every execution.

Context Runtime — a runtime layer that plans how context flows through an AI system and improves with every execution

Executive Summary

The AI industry has spent the past two years building better language models, better retrieval systems and better agent frameworks.

Yet every production AI application still rebuilds the same infrastructure. Every engineering team writes custom code to decide:

What information should reach the model?
Which retrieval strategy should be used?
Which model should execute the task?
Should previous context be retrieved, compressed or discarded?
Should another model verify the answer?
Should work be delegated to multiple agents?

These decisions are currently embedded in application code. They should be infrastructure.

Context Runtime is a provider-agnostic optimization layer that automatically determines the optimal context execution plan before any model is called.

Applications describe intent. The runtime determines the initial execution strategy. Over time, it continuously improves those decisions using measured execution outcomes. Cost-based planning gets you the first execution; learning gets you the thousandth.

The Problem

Current AI systems are built around static pipelines — User → Prompt → RAG → Model → Answer. Every serious application eventually extends that pipeline: hybrid search, BM25, GraphRAG, conversation memory, prompt caching, summarization, agent routing, verification, model selection, policy enforcement.

Every feature introduces another branch of application logic. As systems become more capable, developers spend less time building products and more time maintaining context pipelines. This is rapidly becoming the dominant engineering cost of production AI.

Every capability bolted onto the pipeline becomes another branch of application code — the maintenance burden, not the product.

Existing Approaches Solve Different Problems

Today's leading AI companies optimize different parts of the stack:

Classical RAG reduces inference cost by retrieving less information.
DeepSeek reduces attention cost by making models process long context more efficiently.
Anthropic manages the lifecycle of context across long-running conversations and agents.
OpenAI increasingly distributes reasoning across orchestration, planning and verification.

These approaches are complementary. None provides a unified runtime responsible for deciding how context should flow through an AI system — and learning to route it better over time.

Our Thesis

Context management should become an operating-system service rather than application code. Applications should no longer decide retrieval strategy, chunk size, reranking, prompt assembly, model routing, verification, compression or caching.

Applications should express intent. The runtime should determine the optimal execution plan — and refine that plan as it observes what actually works.

This is exactly what modern database query planners did for relational databases. Developers write SQL; the database determines how to execute it. Context Runtime applies the same principle to AI — and, like adaptive query optimizers, it uses execution feedback and statistics to get better with use.

The query-planner abstraction, brought to AI: you submit a goal, the runtime plans and runs the optimal execution graph — and, like an adaptive optimizer, tunes it from feedback.

The Context Runtime

Instead of manually assembling prompts, applications submit goals. The runtime evaluates multiple strategies while respecting application constraints — latency, cost, security, token budget, verification requirements and provider capabilities. Only then does execution begin.

And execution is not the end of the loop. Each run produces a measured outcome — a reward — that updates the runtime's cost model and strategy selection, so the next plan for a similar intent is better than the last.

Intent in, a cost-optimized execution graph out — verified before a single token is generated, and improved after every run.

Architecture

Context Runtime separates planning from execution. The planner produces an Execution Graph; a pluggable scheduler executes that graph. Foundation models and retrieval systems become interchangeable plugins, verification becomes a runtime policy — and the application remains unchanged.

The runtime emits a backend-independent Execution Graph. The reference implementation includes an in-process scheduler and defines interfaces for distributed execution engines such as Dagster, Ray and Spark — so the same plan can run locally today and on distributed backends tomorrow without changing the planner or the application.

Plan once, run anywhere: the planner is provider-independent and outcome-driven; models, retrieval and execution engines are swappable underneath it.

Retrieval as a first-class, routable capability

Retrieval is not a single fixed strategy but a set of runtime primitives the planner selects and cost-models per request:

Sparse (BM25) — IDF-weighted lexical matching for exact, rare terms.
Dense (semantic embeddings) — bridges synonyms and morphology across languages.
Hybrid — BM25 ⊕ dense fused by Reciprocal-Rank Fusion.
Graph / multi-hop — Personalized-PageRank over a passage graph for connective questions.
Learned method routing — the runtime learns which retrieval method (and pool size, reranking, thresholds) wins for each intent, from measured outcomes.

Retrieval becomes a routable decision: the runtime picks the method per request and learns which one wins for each kind of intent.

Why This Matters

Today's AI applications are tightly coupled to providers and frameworks. Tomorrow's applications will depend on infrastructure that automatically determines:

what information matters,
where it lives,
when it should be retrieved,
how it should be compressed,
who should process it,
how it should be verified,
and how to do all of the above better over time.

Context Runtime becomes that infrastructure layer.

Open Source First

The project begins as an AGPL-licensed reference implementation. Its purpose is to validate the architecture and establish an open specification for context management. The initial implementation focuses on:

provider-independent planning,
cost-based optimization,
outcome-driven (self-optimizing) planning,
execution graph generation,
retrieval optimization (sparse · dense · hybrid · graph · learned routing),
verification,
observability.

The goal is not another agent framework. The goal is defining a new infrastructure abstraction. Multiple implementations of that abstraction are possible; a Python reference implementation exists today.

Enterprise Vision

As AI workloads become larger and increasingly distributed, context planning itself becomes a distributed-systems problem. The long-term roadmap covers distributed planning and execution across Kubernetes, Dagster, Spark, Ray and future execution backends. This enables:

multi-node context planning,
high availability,
enterprise policy enforcement,
distributed plan caching,
execution tracing,
large-scale optimization,
multi-tenant deployments.

The planner remains provider-independent while execution engines continue to evolve independently.

Market Opportunity

Every production AI system already performs context planning. Most simply do it manually. The industry has standardized foundation models. It has standardized tool calling. It has standardized retrieval. It has not standardized context optimization.

That missing layer represents an opportunity to define a new category of AI infrastructure.

Vision

Compilers removed the need to write assembly. Operating systems removed the need to manage hardware directly. Database query planners removed the need to handcraft execution strategies — and adaptive optimizers later removed the need to hand-tune them as data changed.

Context Runtime brings the same arc to AI. Applications describe intent. The runtime determines the optimal execution plan, and learns from every execution. Everything else becomes implementation.

Context Runtime is not another AI framework. It is the infrastructure layer that makes AI systems simpler, cheaper, more reliable, provider-independent — and self-improving.