OpenSandbox: The Universal Sandbox Platform Every AI Agent Needs
Alibaba open-sourced a general-purpose sandbox platform for AI applications — supporting Coding Agents, GUI Agents, RL Training and more, with multi-language SDKs and Docker/Kubernetes runtimes.
Part 1: Foundations — The Mental Model
Imagine you are an AI agent. You need to write code, run it, browse the web, interact with a desktop, maybe even train a model — all in a safe, isolated environment. The host system must not be affected, yet you need full power within the box.
That is exactly what OpenSandbox by Alibaba provides.
Mental Model: Think of OpenSandbox as a universal remote-controlled sandbox — a standardized socket into which any AI agent (Claude Code, Gemini CLI, LangGraph, Google ADK, etc.) can plug. The sandbox wraps Docker containers or Kubernetes pods and exposes one consistent API for creating environments, running commands, managing files, and interpreting code.
Instead of each AI framework inventing its own execution sandbox, OpenSandbox offers a single, open protocol that all of them can share.
Part 2: The Investigation — Architecture Deep Dive
The Layered Architecture
OpenSandbox is structured into clear layers, each solving one concern:
┌────────────────────────────────────────────────────────────┐
│ Multi-Language SDKs │
│ Python │ JS/TS │ Java/Kotlin │ C#/.NET │ Go* │
└────────────────────┬───────────────────────────────────────┘
│ Sandbox Protocol (OpenAPI / OSEPs)
┌────────────────────▼───────────────────────────────────────┐
│ OpenSandbox Server │
│ (Sandbox lifecycle: create, start, pause, kill) │
└──────┬──────────────────────────────────────────┬──────────┘
│ │
┌──────▼──────┐ ┌─────────▼──────────┐
│ Docker │ │ Kubernetes HPA │
│ Runtime │ │ (high-perf runtime)│
└─────────────┘ └────────────────────┘
│ │
┌──────▼──────────────────────────────────────────▼──────────┐
│ Sandbox Environments │
│ Commands │ Files │ Code Interpreter │ Browser │ VNC│
└─────────────────────────────────────────────────────────────┘
(*Go SDK is on the roadmap)
Project Structure
| Directory | Purpose |
|---|---|
sdks/ | Client SDKs (Python, JS/TS, Java, C#) |
specs/ | OpenAPI + OSEP (OpenSandbox Enhancement Proposals) |
server/ | The core sandbox server |
kubernetes/ | Kubernetes runtime for distributed scheduling |
components/execd/ | Execution daemon inside the sandbox container |
components/ingress/ | Ingress gateway with multi-routing strategies |
components/egress/ | Per-sandbox egress/network policy control |
sandboxes/ | Pre-built sandbox images |
examples/ | End-to-end integration examples |
Sandbox Protocol (OSEPs)
OpenSandbox uses a formal proposal process called OSEP (OpenSandbox Enhancement Proposals) to evolve the platform. This is similar to PEPs in Python, keeping the protocol community-driven and well-documented. The protocol defines two classes of APIs:
- Lifecycle APIs:
create,start,pause,resume,kill→ manages the sandbox container - Execution APIs:
commands.run,files.write,files.read,codes.run→ interacts with what’s inside
Security — Strong Isolation Options
This is where OpenSandbox stands apart from naive Docker-only sandboxes. It natively supports secure container runtimes:
- gVisor — userspace kernel that intercepts system calls
- Kata Containers — lightweight VMs with hardware isolation
- Firecracker microVMs — ultra-fast micro-virtual machines (used by AWS Lambda)
Each provides progressively stronger isolation guarantees between sandbox workloads and the host.
Part 3: The Diagnosis — What It Does for Developers
Problem 1: Every AI Agent Framework Reinvents the Same Sandbox
Before OpenSandbox, if you wanted to run Claude Code, Gemini CLI, and LangGraph safely side-by-side, you would need three different sandbox integration layers. OpenSandbox unifies them under one protocol.
Problem 2: Scaling From Laptop to Kubernetes Is Hard
OpenSandbox’s Docker runtime is for local development. Its Kubernetes runtime (kubernetes/) handles distributed, large-scale scheduling of thousands of sandboxes — without changing a single line of your application code. The same SDK calls work locally and in production.
Problem 3: Multi-Language Teams Need Multi-Language SDKs
Currently supported SDKs:
| Language | Status |
|---|---|
| Python | ✅ Stable |
| JavaScript / TypeScript | ✅ Stable |
| Java / Kotlin | ✅ Stable |
| C# / .NET | ✅ Stable |
| Go | 🔜 Roadmap |
Real-World Use Cases
| Scenario | Example |
|---|---|
| Coding Agent | Claude Code, Gemini CLI, OpenAI Codex CLI |
| LLM Workflow | LangGraph state machines creating sandbox jobs |
| GUI Automation | Headless Chrome + Playwright in a sandbox |
| Desktop Environment | VNC + full Linux desktop inside a container |
| Remote Dev | VS Code (code-server) serving from a sandbox |
| RL Training | Run training episodes in isolated containers |
| Agent Evaluation | Reproducible, isolated eval environments |
Part 4: The Resolution — How to Use OpenSandbox
Quickstart in 3 Steps
Step 1 — Install and configure the server
uv pip install opensandbox-server
opensandbox-server init-config ~/.sandbox.toml --example docker
Step 2 — Start the sandbox server
opensandbox-server
Step 3 — Create a sandbox and run code
import asyncio
from datetime import timedelta
from code_interpreter import CodeInterpreter, SupportedLanguage
from opensandbox import Sandbox
from opensandbox.models import WriteEntry
async def main() -> None:
# 1. Create a sandbox from a Docker image
sandbox = await Sandbox.create(
"opensandbox/code-interpreter:v1.0.1",
entrypoint=["/opt/opensandbox/code-interpreter.sh"],
env={"PYTHON_VERSION": "3.11"},
timeout=timedelta(minutes=10),
)
async with sandbox:
# 2. Run a shell command
execution = await sandbox.commands.run("echo 'Hello OpenSandbox!'")
print(execution.logs.stdout[0].text) # Hello OpenSandbox!
# 3. Write a file
await sandbox.files.write_files([
WriteEntry(path="/tmp/hello.txt", data="Hello World", mode=644)
])
# 4. Read it back
content = await sandbox.files.read_file("/tmp/hello.txt")
print(f"Content: {content}") # Content: Hello World
# 5. Run Python code inside the sandbox
interpreter = await CodeInterpreter.create(sandbox)
result = await interpreter.codes.run(
"""
import sys
print(sys.version)
result = 2 + 2
result
""",
language=SupportedLanguage.PYTHON,
)
print(result.result[0].text) # 4
print(result.logs.stdout[0].text) # 3.11.x
# Sandbox auto-cleaned up
Integrating with a Coding Agent (Google ADK Example)
# examples/google-adk: use OpenSandbox as the tool backend for a Google ADK agent
from google.adk.tools import BaseTool
from opensandbox import Sandbox
class SandboxRunTool(BaseTool):
async def run_in_sandbox(self, code: str) -> str:
sandbox = await Sandbox.create("opensandbox/code-interpreter:v1.0.1")
async with sandbox:
interpreter = await CodeInterpreter.create(sandbox)
result = await interpreter.codes.run(code, language=SupportedLanguage.PYTHON)
return result.result[0].text
Running Claude Code or Gemini CLI in a Sandbox
# Clone the examples
git clone https://github.com/alibaba/OpenSandbox.git
cd OpenSandbox/examples/claude-code # or gemini-cli, codex-cli, etc.
# Follow the README in each example directory
Each example ships with a Dockerfile and a startup script that drops the specified AI CLI tool inside a fully managed OpenSandbox environment.
Final Mental Model
┌────────────────────────────────────────────────────────────┐
│ OpenSandbox │
│ │
│ "A universal socket for AI agent execution" │
│ │
│ What it IS: │
│ → Open protocol sandbox with lifecycle + execution APIs │
│ → Multi-language SDKs (Python, JS, Java, C#) │
│ → Docker local dev + Kubernetes production scaling │
│ │
│ What it SOLVES: │
│ → Fragmented sandbox implementations per AI framework │
│ → Unsafe code execution without isolation │
│ → Scaling from laptop to cloud without code changes │
│ │
│ What it ENABLES: │
│ → Coding agents (Claude, Gemini, Codex) │
│ → GUI agents (Chrome, Playwright, VNC) │
│ → RL training + agent evaluation │
│ → Remote dev (VS Code inside a sandbox) │
│ │
│ Isolation options: gVisor | Kata Containers | Firecracker │
└────────────────────────────────────────────────────────────┘
GitHub: alibaba/OpenSandbox
Docs: open-sandbox.ai
Related posts
-
MoneyPrinterV2: What 18,000 Stars Worth of Automated Content Actually Looks Like
An assembly line for AI content — local LLMs write the script, KittenTTS reads it, Gemini paints the pictures. The video uploads itself.
-
Project N.O.M.A.D.: The Knowledge Bunker You Build for a Rainless Day
When the cloud evaporates, what stays on your disk matters.
-
Superpowers: The Workflow That Teaches AI Agents Discipline
Superpowers makes coding agents slow down, ask questions, write plans, and test first. The result is less flashy AI code, but much more trustworthy code.
-
The Agency: Transform Your Workflow with a Team of AI Specialists
Discover how The Agency replaces generic prompts with a meticulously crafted roster of specialized AI agents for engineering, design, and more.