The Architecture of a Semantic Firewall

Summary

What it is: A deterministic, low-latency firewall that validates agent tool calls before execution.
Why now: Agents are gaining real access to production systems, making action safety a first-class requirement.
Who it’s for: Teams shipping autonomous agents that call APIs, execute SQL, or trigger high-risk operations.

Goals

Block unsafe tool calls before execution (fail-closed).
Keep latency low enough for interactive experiences.
Provide auditability and clear policy reasons for blocks.

Non-Goals

Replacing backend authorization (this complements it).
Detecting every possible business logic bug.
Acting as the only safety layer in a full platform.

Requirements

Functional

Intercept all agent tool calls.
Enforce policy and role constraints.
Detect unsafe code patterns (SQL/Python/CLI).
Escalate to content inspection when needed.

Non-Functional

Latency: p95 < 50ms.
Reliability: fail-closed on errors.
Explainability: return a reason code for every block.

Architecture Overview

graph LR
    Agent -->|Tool Call| Guardian

    subgraph Guardian [The Semantic Firewall]
        Tier1[Tier 1: Policy Engine] -->|Pass| Tier2
        Tier2[Tier 2: Static Analysis] -->|Pass| Tier3
        Tier3[Tier 3: Content Inspection]
    end

    Tier3 -->|ALLOW| Agent
    Tier3 -->|BLOCK| Agent

Key Design Decisions

1) Fail-Closed vs. Fail-Open

Options: Allow on error vs. block on error.
Choice: Fail-closed.
Rationale: Unsafe actions are higher risk than temporary unavailability.
Tradeoffs: Availability degradation during safety outages.

2) AST vs. LLM for Code Analysis

Options: Deterministic AST vs. probabilistic LLM checks.
Choice: AST.
Rationale: Fast and consistent; blocks known-bad patterns reliably.
Tradeoffs: Limited semantic reasoning for complex logic bugs.

3) Chain of Responsibility

Options: Run all checks vs. short-circuit.
Choice: Short-circuit in a chain.
Rationale: Saves compute; cheap checks first, expensive last.
Tradeoffs: Requires good ordering and thresholds.

Data Flows

Happy Path

Agent emits tool call.
Policy engine approves.
AST analysis passes.
Content inspection passes.
Action allowed.

Failure Path

Agent emits tool call.
Policy engine blocks or AST flags a dangerous pattern.
Guardian returns a block with reason code.

Observability

Metrics: block rate, tier latency, reason distribution.
Logs: request_id, tool name, policy decision, AST flags.
Tracing: span per tier for audit trails.

Risks & Mitigations

Policy drift: version policies and audit changes.
False positives: use tiered checks + allowlist overrides.
Latency spikes: cap deep checks and short-circuit aggressively.

Rollout Plan

Start in shadow mode (log only).
Enable blocking for high-risk actions first.
Expand to broader tool set with monitored thresholds.

Open Questions

Where should manual approval workflows live?
How should policy rules be shared across tenants?

Notes

Guardian processes an average request in ~18ms (Policy: 0.5ms, AST: 3ms, Sentinel: 15ms). This doc is part of the Aether platform.

Live demo: /guardian