AI agent sandboxing

6 min read · Beginner


What is an AI agent?

An AI agent is an LLM with tools. Instead of just generating text, it can take actions: run shell commands, read and write files, make API calls, browse the web, send messages, and execute code.

Traditional LLM usage is conversational—you ask a question, you get an answer. An AI agent operates autonomously. You give it a goal (“deploy this application,” “research these competitors,” “fix this bug”) and it figures out the steps, executes them, and handles errors along the way.

Examples include:

  • Coding assistants like Claude Code, GitHub Copilot Workspace, and Cursor that can edit files, run tests, and commit code
  • Personal assistants like OpenClaw/Moltbot that manage calendars, send emails, and browse the web
  • DevOps agents that monitor systems, respond to alerts, and execute runbooks
  • Research agents that gather information, synthesize findings, and produce reports

The more useful an agent is, the more access it needs. And the more access it has, the more dangerous it becomes without proper sandboxing.

What happens without sandboxing

The OpenClaw project (formerly Clawdbot and Moltbot) demonstrates what happens when AI agents run without proper isolation. The open-source “personal AI assistant” gained 180,000+ GitHub stars in weeks—and immediately became a security incident.

No isolation by default. OpenClaw runs with the same access as the user: full file system, network, and credentials. The project documentation admits “there is no ‘perfectly secure’ setup.”

Thousands of exposed instances. Security researchers found 4,500+ OpenClaw instances exposed to the internet, many with no authentication. Attackers could exfiltrate API keys, service tokens, and session credentials.

Remote code execution. A critical vulnerability (CVE-2026-25253) allowed one-click RCE through a malicious link—token exfiltration leading to full gateway compromise.

Malicious extensions. Over 400 malicious “skills” appeared on ClawHub, posing as crypto trading tools while delivering info-stealing malware.

No trust boundaries. Web content and third-party extensions can directly influence the agent’s planning and execution without policy mediation. Prompt injection attacks go straight to tool invocation.

Palo Alto Networks called this the “lethal trifecta”: access to private data, exposure to untrusted content, and ability to communicate externally. The recommended mitigation? Run the agent in an isolated VM.

Why AI agents can’t be secured with policy

Traditional security relies on defining what’s allowed and blocking everything else. This doesn’t work for AI agents.

AI agents are non-deterministic. You can’t enumerate valid behavior because you don’t know what the agent will do next. It might write a Python script, install a package, make an API call, or build a container—all as part of a legitimate task.

This means you can’t write policy. You can’t create allowlists. You can’t use traditional access controls without crippling the agent’s usefulness.

Sandboxing is required, not optional. The only viable approach is to let agents do whatever they need while containing the blast radius. If an agent is compromised or misbehaves, the damage stays within its sandbox.

The problem: agents need dangerous permissions

When you run an AI agent in “yolo mode” (fully autonomous), it needs:

  • File system access: Reading and writing files, creating directories, modifying configurations
  • Network access: Making HTTP requests, pulling Git repos, accessing APIs
  • Process execution: Running arbitrary commands, compiling code, executing scripts
  • Container operations: Building images, pushing to registries, managing workloads

In a traditional container environment, granting these permissions means the agent can potentially:

  • Escape the container and access the host
  • Read secrets from other containers via /proc
  • Move laterally across your infrastructure
  • Exfiltrate data through network connections

The shared-kernel model of traditional containers means a compromised or misbehaving AI agent becomes a threat to your entire node.

How sandboxing approaches compare

Not all sandboxes are created equal. The architecture matters.

ApproachIsolation levelKernel vulnerability
nono, Trail of BitsProcess-level (Landlock, Seatbelt)Breaks sandbox
Docker sandboxContainer-levelBreaks sandbox
EderaVM-level with per-workload kernelsSandbox intact

Process-level and container-level sandboxes share the host kernel. A kernel vulnerability—and there are many—gives an attacker a path out. Edera runs each workload with its own kernel. A kernel exploit compromises one zone, not your cluster.

This isn’t incremental improvement. It’s a different architecture entirely.

How Edera sandboxes AI agents

Edera zones provide hardware-level isolation for AI agents. Each agent runs in its own dedicated virtual machine with:

  • Separate kernel: No shared kernel attack surface
  • Isolated file system: Agent’s file operations can’t escape the zone
  • Contained network: Network access is scoped to the zone
  • Independent process space: No visibility into other workloads

Two lines of Kubernetes config. Your agent image runs unmodified:

apiVersion: v1
kind: Pod
metadata:
  name: ai-agent
spec:
  runtimeClassName: edera
  containers:
  - name: claude-agent
    image: your-ai-agent-image
    env:
    - name: ANTHROPIC_API_KEY
      valueFrom:
        secretKeyRef:
          name: anthropic-credentials
          key: api-key

The agent gets a complete Linux environment—file system access, network connectivity, process execution—but it’s all contained within the Edera zone. Even if the agent is compromised or behaves unexpectedly, it can’t affect anything outside its zone.

Performance

Sandboxing doesn’t have to cost performance. Edera’s benchmarks against Docker:

MetricEdera vs Docker
CPU0.9% slower
Memory0-7% faster
Syscalls3% faster average

The overhead is negligible. You get VM-level isolation at near-native container performance.

What this protects against

Edera sandboxing defends against:

  • Container escapes: Agent can’t break out of its zone
  • Kernel exploits: Per-workload kernels limit blast radius
  • Lateral movement: No access to other pods or the host
  • Secret theft: No visibility into /proc of other workloads
  • Host compromise: Agent has no path to the underlying node

What this doesn’t protect against

Edera sandboxes the compute environment. It doesn’t sandbox:

  • API credentials: If an agent has valid API keys, it can use them. Scope credentials appropriately.
  • Network destinations: An agent can make outbound requests to any allowed endpoint. Use network policies for egress control.
  • Data the agent is given: If you pass sensitive data to an agent, the agent has that data.

Sandboxing contains the blast radius. It doesn’t replace proper credential management and data handling.

Use cases

Development agents: Run coding assistants like Claude Code with full autonomy. The agent can modify files, run tests, and build applications—all isolated from your production systems.

CI/CD automation: Let AI agents handle build and deployment tasks without granting access to your actual infrastructure.

Research and experimentation: Give agents freedom to explore, install packages, and run experiments without worrying about cleanup or security implications.

Multi-tenant AI platforms: Run multiple AI agents for different users or teams, each in their own zone, with no cross-contamination risk.

Next steps

Last updated on