Agentic Orchestration Patterns
Designing the operating system for AI: Durable execution, memory kernels, and cognitive architectures.
Summary
- What it is: A blueprint for building reliable, long-running autonomous agents that survive infrastructure failures.
- Why now: As agents move from “chatbots” to “workers,” they need state persistence, retry logic, and structured memory.
- Who it’s for: Platform engineers building the “Agent Cloud” (e.g., Strategos).
The Core Problem: Stateless vs. Stateful
Traditional ML serving (Hyperion) is stateless: Input -> Model -> Output.
Agentic workloads are stateful loops: Observe -> Plan -> Act -> Observe.
The Challenge:
- Timeouts: Agent tasks (e.g., “Research X”) can take minutes or hours. HTTP connections fail.
- Failure: If a pod crashes mid-thought, the agent’s context is lost.
- Context Window: Infinite history crashes the model.
Pattern 1: Durable Execution (The Kernel)
Instead of storing “current state” in a mutable database row, we use Event Sourcing.
The Event Log
We persist every significant step as an immutable event:
WorkflowStartedActivityScheduled(Tool Call)ActivityCompleted(Tool Result)TimerFired
The Replay Mechanism
When a worker crashes and restarts:
- Load the full Event History.
- Replay the code from the beginning.
- Skip side-effects (tools) that are already marked
Completedin the log. - Resume execution exactly where it died.
Result: Effectively “infinite” uptime for agents, guaranteeing Exactly-Once Execution for critical tools.
Pattern 2: The Memory Hierarchy (The MMU)
Just like an OS manages RAM, an Agent OS must manage Context.
| Tier | Analogy | Description | Latency |
|---|---|---|---|
| Working | L1 Cache | The immediate prompt context. Expensive ($), fast. | ms |
| Episodic | RAM / Swap | Vector DB (RAG) for recent interactions. | ~100ms |
| Structured | Disk | SQL/Graph DB for permanent facts (“User is Admin”). | ~10ms |
Context Paging: The orchestrator automatically “pages out” old turns from Working Memory to Episodic Memory, and “pages in” relevant facts based on the current Goal.
Pattern 3: Cognitive Architectures
The “Brain” logic should be pluggable.
ReAct (Reason + Act)
Interleaved thinking and doing. Good for dynamic environments.
Thought -> Action -> Observation -> Thought...
Plan-and-Solve
Generate a full Dependency Graph (DAG) of tasks first, then execute. Good for complex, deterministic goals.
Reflection
A secondary loop where the agent critiques its own output before finalizing it. Increases quality at the cost of latency.
Architecture Reference
graph TD
Client -->|Goal| Gateway
Gateway -->|Start Workflow| Orchestrator[Strategos: Durable Engine]
subgraph "The Agent Loop"
Orchestrator <-->|Replay/Persist| EventLog[(SQLite/Postgres)]
Orchestrator -->|Context| Memory[Memory Kernel]
Orchestrator -->|Prompt| LLM[Hyperion: Inference]
Orchestrator -->|Validate| Guardian[Safety Layer]
Guardian -->|Execute| Tools[Tool Registry]
end
Risks & Mitigations
- Infinite Loops: Agents getting stuck repeating “I need to check status”.
- Fix: Step limits and semantic loop detection (embedding similarity of last N thoughts).
- Context Pollution: Retrieving irrelevant memories confuses the model.
- Fix: Strict relevance thresholds and query rewriting.
- Cost Runaway:
- Fix: Token quotas per workflow and per tenant.
Related Projects
- Strategos: The reference implementation of this orchestration pattern.
- Hyperion: The inference engine powering the LLM calls.