OpenSandbox: The Universal Sandbox Platform Every AI Agent Needs

Part 1: Foundations — The Mental Model

Imagine you are an AI agent. You need to write code, run it, browse the web, interact with a desktop, maybe even train a model — all in a safe, isolated environment. The host system must not be affected, yet you need full power within the box.

That is exactly what OpenSandbox by Alibaba provides.

Mental Model: Think of OpenSandbox as a universal remote-controlled sandbox — a standardized socket into which any AI agent (Claude Code, Gemini CLI, LangGraph, Google ADK, etc.) can plug. The sandbox wraps Docker containers or Kubernetes pods and exposes one consistent API for creating environments, running commands, managing files, and interpreting code.

Instead of each AI framework inventing its own execution sandbox, OpenSandbox offers a single, open protocol that all of them can share.

Part 2: The Investigation — Architecture Deep Dive

The Layered Architecture

OpenSandbox is structured into clear layers, each solving one concern:

┌────────────────────────────────────────────────────────────┐
│                 Multi-Language SDKs                        │
│    Python  │  JS/TS  │  Java/Kotlin  │  C#/.NET  │  Go*   │
└────────────────────┬───────────────────────────────────────┘
                     │ Sandbox Protocol (OpenAPI / OSEPs)
┌────────────────────▼───────────────────────────────────────┐
│                OpenSandbox Server                          │
│  (Sandbox lifecycle: create, start, pause, kill)           │
└──────┬──────────────────────────────────────────┬──────────┘
       │                                          │
┌──────▼──────┐                         ┌─────────▼──────────┐
│   Docker    │                         │   Kubernetes HPA   │
│  Runtime    │                         │ (high-perf runtime)│
└─────────────┘                         └────────────────────┘
       │                                          │
┌──────▼──────────────────────────────────────────▼──────────┐
│              Sandbox Environments                          │
│  Commands  │  Files  │  Code Interpreter  │  Browser  │ VNC│
└─────────────────────────────────────────────────────────────┘

(*Go SDK is on the roadmap)

Project Structure

| Directory | Purpose | |---|---| | sdks/ | Client SDKs (Python, JS/TS, Java, C#) | | specs/ | OpenAPI + OSEP (OpenSandbox Enhancement Proposals) | | server/ | The core sandbox server | | kubernetes/ | Kubernetes runtime for distributed scheduling | | components/execd/ | Execution daemon inside the sandbox container | | components/ingress/ | Ingress gateway with multi-routing strategies | | components/egress/ | Per-sandbox egress/network policy control | | sandboxes/ | Pre-built sandbox images | | examples/ | End-to-end integration examples |

Sandbox Protocol (OSEPs)

OpenSandbox uses a formal proposal process called OSEP (OpenSandbox Enhancement Proposals) to evolve the platform. This is similar to PEPs in Python, keeping the protocol community-driven and well-documented. The protocol defines two classes of APIs:

Lifecycle APIs: create, start, pause, resume, kill → manages the sandbox container
Execution APIs: commands.run, files.write, files.read, codes.run → interacts with what’s inside

Security — Strong Isolation Options

This is where OpenSandbox stands apart from naive Docker-only sandboxes. It natively supports secure container runtimes:

gVisor — userspace kernel that intercepts system calls
Kata Containers — lightweight VMs with hardware isolation
Firecracker microVMs — ultra-fast micro-virtual machines (used by AWS Lambda)

Each provides progressively stronger isolation guarantees between sandbox workloads and the host.

Part 3: The Diagnosis — What It Does for Developers

Problem 1: Every AI Agent Framework Reinvents the Same Sandbox

Before OpenSandbox, if you wanted to run Claude Code, Gemini CLI, and LangGraph safely side-by-side, you would need three different sandbox integration layers. OpenSandbox unifies them under one protocol.

Problem 2: Scaling From Laptop to Kubernetes Is Hard

OpenSandbox’s Docker runtime is for local development. Its Kubernetes runtime (kubernetes/) handles distributed, large-scale scheduling of thousands of sandboxes — without changing a single line of your application code. The same SDK calls work locally and in production.

Problem 3: Multi-Language Teams Need Multi-Language SDKs

Currently supported SDKs:

| Language | Status | |---|---| | Python | ✅ Stable | | JavaScript / TypeScript | ✅ Stable | | Java / Kotlin | ✅ Stable | | C# / .NET | ✅ Stable | | Go | 🔜 Roadmap |

Real-World Use Cases

| Scenario | Example | |---|---| | Coding Agent | Claude Code, Gemini CLI, OpenAI Codex CLI | | LLM Workflow | LangGraph state machines creating sandbox jobs | | GUI Automation | Headless Chrome + Playwright in a sandbox | | Desktop Environment | VNC + full Linux desktop inside a container | | Remote Dev | VS Code (code-server) serving from a sandbox | | RL Training | Run training episodes in isolated containers | | Agent Evaluation | Reproducible, isolated eval environments |

Part 4: The Resolution — How to Use OpenSandbox

Quickstart in 3 Steps

Step 1 — Install and configure the server

uv pip install opensandbox-server
opensandbox-server init-config ~/.sandbox.toml --example docker

Step 2 — Start the sandbox server

opensandbox-server

Step 3 — Create a sandbox and run code

import asyncio
from datetime import timedelta
from code_interpreter import CodeInterpreter, SupportedLanguage
from opensandbox import Sandbox
from opensandbox.models import WriteEntry

async def main() -> None:
    # 1. Create a sandbox from a Docker image
    sandbox = await Sandbox.create(
        "opensandbox/code-interpreter:v1.0.1",
        entrypoint=["/opt/opensandbox/code-interpreter.sh"],
        env={"PYTHON_VERSION": "3.11"},
        timeout=timedelta(minutes=10),
    )

    async with sandbox:
        # 2. Run a shell command
        execution = await sandbox.commands.run("echo 'Hello OpenSandbox!'")
        print(execution.logs.stdout[0].text)   # Hello OpenSandbox!

        # 3. Write a file
        await sandbox.files.write_files([
            WriteEntry(path="/tmp/hello.txt", data="Hello World", mode=644)
        ])

        # 4. Read it back
        content = await sandbox.files.read_file("/tmp/hello.txt")
        print(f"Content: {content}")  # Content: Hello World

        # 5. Run Python code inside the sandbox
        interpreter = await CodeInterpreter.create(sandbox)
        result = await interpreter.codes.run(
            """
            import sys
            print(sys.version)
            result = 2 + 2
            result
            """,
            language=SupportedLanguage.PYTHON,
        )
        print(result.result[0].text)       # 4
        print(result.logs.stdout[0].text)  # 3.11.x

    # Sandbox auto-cleaned up

Integrating with a Coding Agent (Google ADK Example)

# examples/google-adk: use OpenSandbox as the tool backend for a Google ADK agent
from google.adk.tools import BaseTool
from opensandbox import Sandbox

class SandboxRunTool(BaseTool):
    async def run_in_sandbox(self, code: str) -> str:
        sandbox = await Sandbox.create("opensandbox/code-interpreter:v1.0.1")
        async with sandbox:
            interpreter = await CodeInterpreter.create(sandbox)
            result = await interpreter.codes.run(code, language=SupportedLanguage.PYTHON)
            return result.result[0].text

Running Claude Code or Gemini CLI in a Sandbox

# Clone the examples
git clone https://github.com/alibaba/OpenSandbox.git
cd OpenSandbox/examples/claude-code  # or gemini-cli, codex-cli, etc.

# Follow the README in each example directory

Each example ships with a Dockerfile and a startup script that drops the specified AI CLI tool inside a fully managed OpenSandbox environment.

Final Mental Model

┌────────────────────────────────────────────────────────────┐
│                        OpenSandbox                         │
│                                                            │
│  "A universal socket for AI agent execution"               │
│                                                            │
│  What it IS:                                               │
│  → Open protocol sandbox with lifecycle + execution APIs   │
│  → Multi-language SDKs (Python, JS, Java, C#)             │
│  → Docker local dev + Kubernetes production scaling        │
│                                                            │
│  What it SOLVES:                                           │
│  → Fragmented sandbox implementations per AI framework     │
│  → Unsafe code execution without isolation                 │
│  → Scaling from laptop to cloud without code changes       │
│                                                            │
│  What it ENABLES:                                          │
│  → Coding agents (Claude, Gemini, Codex)                   │
│  → GUI agents (Chrome, Playwright, VNC)                    │
│  → RL training + agent evaluation                          │
│  → Remote dev (VS Code inside a sandbox)                   │
│                                                            │
│  Isolation options: gVisor | Kata Containers | Firecracker │
└────────────────────────────────────────────────────────────┘

GitHub: alibaba/OpenSandbox
Docs: open-sandbox.ai