← redevops.io
Private pitch · redevops.io

Context Runtime

A Cost-Based, Self-Optimizing Runtime for AI Context Management
Applications describe intent. The runtime determines the execution strategy — and improves it with every execution.
Context Runtime — a runtime layer that plans how context flows through an AI system and improves with every execution

Executive Summary

The AI industry has spent the past two years building better language models, better retrieval systems and better agent frameworks.

Yet every production AI application still rebuilds the same infrastructure. Every engineering team writes custom code to decide:

These decisions are currently embedded in application code. They should be infrastructure.

Context Runtime is a provider-agnostic optimization layer that automatically determines the optimal context execution plan before any model is called.

Applications describe intent. The runtime determines the initial execution strategy. Over time, it continuously improves those decisions using measured execution outcomes. Cost-based planning gets you the first execution; learning gets you the thousandth.

The Problem

Current AI systems are built around static pipelines — User → Prompt → RAG → Model → Answer. Every serious application eventually extends that pipeline: hybrid search, BM25, GraphRAG, conversation memory, prompt caching, summarization, agent routing, verification, model selection, policy enforcement.

Every feature introduces another branch of application logic. As systems become more capable, developers spend less time building products and more time maintaining context pipelines. This is rapidly becoming the dominant engineering cost of production AI.

THE STATIC PIPELINE — AND THE SPRAWL IT GROWS User Prompt RAG Model Answer …every serious app extends the pipeline — all embedded in application code: Hybrid searchBM25GraphRAGConversation memoryPrompt cachingSummarizationAgent routingVerificationModel selectionPolicy enforcement
Every capability bolted onto the pipeline becomes another branch of application code — the maintenance burden, not the product.

Existing Approaches Solve Different Problems

Today's leading AI companies optimize different parts of the stack:

These approaches are complementary. None provides a unified runtime responsible for deciding how context should flow through an AI system — and learning to route it better over time.

Our Thesis

Context management should become an operating-system service rather than application code. Applications should no longer decide retrieval strategy, chunk size, reranking, prompt assembly, model routing, verification, compression or caching.

Applications should express intent. The runtime should determine the optimal execution plan — and refine that plan as it observes what actually works.

This is exactly what modern database query planners did for relational databases. Developers write SQL; the database determines how to execute it. Context Runtime applies the same principle to AI — and, like adaptive query optimizers, it uses execution feedback and statistics to get better with use.

THE SAME ABSTRACTION DATABASES HAVE HAD SINCE 1979 You write intent. The planner decides execution — and adapts it with use. RELATIONAL DATABASESQLQuery plannerExecution planResultAI SYSTEMGoalContext RuntimeExecution graphVerified result
The query-planner abstraction, brought to AI: you submit a goal, the runtime plans and runs the optimal execution graph — and, like an adaptive optimizer, tunes it from feedback.

The Context Runtime

Instead of manually assembling prompts, applications submit goals. The runtime evaluates multiple strategies while respecting application constraints — latency, cost, security, token budget, verification requirements and provider capabilities. Only then does execution begin.

And execution is not the end of the loop. Each run produces a measured outcome — a reward — that updates the runtime's cost model and strategy selection, so the next plan for a similar intent is better than the last.

CONTEXT RUNTIME — PLAN, EXECUTE, LEARN Goalthe application's intentIntent analysisCandidate plansseveral strategiesCost optimizationscore plans vs constraintsExecution graphExecutionpluggable schedulerRewardmeasured outcomeBandit learning + cost-model updateCONSTRAINTS• latency• cost• security• token budget• verification• provider capabilitiesimprovesthe next plan
Intent in, a cost-optimized execution graph out — verified before a single token is generated, and improved after every run.

Architecture

Context Runtime separates planning from execution. The planner produces an Execution Graph; a pluggable scheduler executes that graph. Foundation models and retrieval systems become interchangeable plugins, verification becomes a runtime policy — and the application remains unchanged.

The runtime emits a backend-independent Execution Graph. The reference implementation includes an in-process scheduler and defines interfaces for distributed execution engines such as Dagster, Ray and Spark — so the same plan can run locally today and on distributed backends tomorrow without changing the planner or the application.

ARCHITECTURE — PLANNING SEPARATED FROM EXECUTION Application expresses intent — submits a Goal · stays unchanged CONTEXT RUNTIME · THE PLANNER Provider-independent cost-based planning + outcome-driven learning Backend-independent execution graph + verification verification = a runtime policy EXECUTION BACKENDS & INTERCHANGEABLE PLUGINS in-processDagsterRaySparkModelsRetrieval
Plan once, run anywhere: the planner is provider-independent and outcome-driven; models, retrieval and execution engines are swappable underneath it.

Retrieval as a first-class, routable capability

Retrieval is not a single fixed strategy but a set of runtime primitives the planner selects and cost-models per request:

RETRIEVAL — A ROUTABLE, LEARNED CAPABILITY Learned method router picks method, pool size, reranking & thresholds per intent — from measured outcomes Sparse · BM25IDF-weighted lexical match for exact, rare termsDense · embeddingsbridges synonyms & morphology across languagesHybrid · RRFBM25 ⊕ dense fused by Reciprocal-Rank FusionGraph · multi-hopPersonalized-PageRank over a passage graphmeasured outcomes update the router
Retrieval becomes a routable decision: the runtime picks the method per request and learns which one wins for each kind of intent.

Why This Matters

Today's AI applications are tightly coupled to providers and frameworks. Tomorrow's applications will depend on infrastructure that automatically determines:

Context Runtime becomes that infrastructure layer.

Open Source First

The project begins as an AGPL-licensed reference implementation. Its purpose is to validate the architecture and establish an open specification for context management. The initial implementation focuses on:

The goal is not another agent framework. The goal is defining a new infrastructure abstraction. Multiple implementations of that abstraction are possible; a Python reference implementation exists today.

Enterprise Vision

As AI workloads become larger and increasingly distributed, context planning itself becomes a distributed-systems problem. The long-term roadmap covers distributed planning and execution across Kubernetes, Dagster, Spark, Ray and future execution backends. This enables:

The planner remains provider-independent while execution engines continue to evolve independently.

Market Opportunity

Every production AI system already performs context planning. Most simply do it manually. The industry has standardized foundation models. It has standardized tool calling. It has standardized retrieval. It has not standardized context optimization.

That missing layer represents an opportunity to define a new category of AI infrastructure.

Vision

Compilers removed the need to write assembly. Operating systems removed the need to manage hardware directly. Database query planners removed the need to handcraft execution strategies — and adaptive optimizers later removed the need to hand-tune them as data changed.

Context Runtime brings the same arc to AI. Applications describe intent. The runtime determines the optimal execution plan, and learns from every execution. Everything else becomes implementation.

Context Runtime is not another AI framework. It is the infrastructure layer that makes AI systems simpler, cheaper, more reliable, provider-independent — and self-improving.
Context Runtime — private pitch. Not for distribution. © 2026 redevops.io  ·  Whitepaper v1.1