AI agent sandboxing
What is an AI agent?
An AI agent is an LLM with tools. Instead of just generating text, it can take actions: run shell commands, read and write files, make API calls, browse the web, send messages, and execute code.
Traditional LLM usage is conversational—you ask a question, you get an answer. An AI agent operates autonomously. You give it a goal (“deploy this application,” “research these competitors,” “fix this bug”) and it figures out the steps, executes them, and handles errors along the way.
Examples include:
- Coding assistants like Claude Code, GitHub Copilot Workspace, and Cursor that can edit files, run tests, and commit code
- Personal assistants like OpenClaw/Moltbot that manage calendars, send emails, and browse the web
- DevOps agents that monitor systems, respond to alerts, and execute runbooks
- Research agents that gather information, synthesize findings, and produce reports
The more useful an agent is, the more access it needs. And the more access it has, the more dangerous it becomes without proper sandboxing.
What happens without sandboxing
The OpenClaw project (formerly Clawdbot and Moltbot) demonstrates what happens when AI agents run without proper isolation. The open-source “personal AI assistant” gained 180,000+ GitHub stars in weeks—and immediately became a security incident.
No isolation by default. OpenClaw runs with the same access as the user: full file system, network, and credentials. The project documentation admits “there is no ‘perfectly secure’ setup.”
Thousands of exposed instances. Security researchers found 4,500+ OpenClaw instances exposed to the internet, many with no authentication. Attackers could exfiltrate API keys, service tokens, and session credentials.
Remote code execution. A critical vulnerability (CVE-2026-25253) allowed one-click RCE through a malicious link—token exfiltration leading to full gateway compromise.
Malicious extensions. Over 400 malicious “skills” appeared on ClawHub, posing as crypto trading tools while delivering info-stealing malware.
No trust boundaries. Web content and third-party extensions can directly influence the agent’s planning and execution without policy mediation. Prompt injection attacks go straight to tool invocation.
Palo Alto Networks called this the “lethal trifecta”: access to private data, exposure to untrusted content, and ability to communicate externally. The recommended mitigation? Run the agent in an isolated VM.
Why AI agents can’t be secured with policy
Traditional security relies on defining what’s allowed and blocking everything else. This doesn’t work for AI agents.
AI agents are non-deterministic. You can’t enumerate valid behavior because you don’t know what the agent will do next. It might write a Python script, install a package, make an API call, or build a container—all as part of a legitimate task.
This means you can’t write policy. You can’t create allowlists. You can’t use traditional access controls without crippling the agent’s usefulness.
Sandboxing is required, not optional. The only viable approach is to let agents do whatever they need while containing the blast radius. If an agent is compromised or misbehaves, the damage stays within its sandbox.
The problem: agents need dangerous permissions
When you run an AI agent in “yolo mode” (fully autonomous), it needs:
- File system access: Reading and writing files, creating directories, modifying configurations
- Network access: Making HTTP requests, pulling Git repos, accessing APIs
- Process execution: Running arbitrary commands, compiling code, executing scripts
- Container operations: Building images, pushing to registries, managing workloads
In a traditional container environment, granting these permissions means the agent can potentially:
- Escape the container and access the host
- Read secrets from other containers via
/proc - Move laterally across your infrastructure
- Exfiltrate data through network connections
The shared-kernel model of traditional containers means a compromised or misbehaving AI agent becomes a threat to your entire node.
How sandboxing approaches compare
Not all sandboxes are created equal. The architecture matters.
| Approach | Isolation level | Kernel vulnerability |
|---|---|---|
| nono, Trail of Bits | Process-level (Landlock, Seatbelt) | Breaks sandbox |
| Docker sandbox | Container-level | Breaks sandbox |
| Edera | VM-level with per-workload kernels | Sandbox intact |
Process-level and container-level sandboxes share the host kernel. A kernel vulnerability—and there are many—gives an attacker a path out. Edera runs each workload with its own kernel. A kernel exploit compromises one zone, not your cluster.
This isn’t incremental improvement. It’s a different architecture entirely.
How Edera sandboxes AI agents
Edera zones provide hardware-level isolation for AI agents. Each agent runs in its own dedicated virtual machine with:
- Separate kernel: No shared kernel attack surface
- Isolated file system: Agent’s file operations can’t escape the zone
- Contained network: Network access is scoped to the zone
- Independent process space: No visibility into other workloads
Two lines of Kubernetes config. Your agent image runs unmodified:
apiVersion: v1
kind: Pod
metadata:
name: ai-agent
spec:
runtimeClassName: edera
containers:
- name: claude-agent
image: your-ai-agent-image
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: anthropic-credentials
key: api-keyThe agent gets a complete Linux environment—file system access, network connectivity, process execution—but it’s all contained within the Edera zone. Even if the agent is compromised or behaves unexpectedly, it can’t affect anything outside its zone.
Performance
Sandboxing doesn’t have to cost performance. Edera’s benchmarks against Docker:
| Metric | Edera vs Docker |
|---|---|
| CPU | 0.9% slower |
| Memory | 0-7% faster |
| Syscalls | 3% faster average |
The overhead is negligible. You get VM-level isolation at near-native container performance.
What this protects against
Edera sandboxing defends against:
- Container escapes: Agent can’t break out of its zone
- Kernel exploits: Per-workload kernels limit blast radius
- Lateral movement: No access to other pods or the host
- Secret theft: No visibility into
/procof other workloads - Host compromise: Agent has no path to the underlying node
What this doesn’t protect against
Edera sandboxes the compute environment. It doesn’t sandbox:
- API credentials: If an agent has valid API keys, it can use them. Scope credentials appropriately.
- Network destinations: An agent can make outbound requests to any allowed endpoint. Use network policies for egress control.
- Data the agent is given: If you pass sensitive data to an agent, the agent has that data.
Sandboxing contains the blast radius. It doesn’t replace proper credential management and data handling.
Use cases
Development agents: Run coding assistants like Claude Code with full autonomy. The agent can modify files, run tests, and build applications—all isolated from your production systems.
CI/CD automation: Let AI agents handle build and deployment tasks without granting access to your actual infrastructure.
Research and experimentation: Give agents freedom to explore, install packages, and run experiments without worrying about cleanup or security implications.
Multi-tenant AI platforms: Run multiple AI agents for different users or teams, each in their own zone, with no cross-contamination risk.