Files
ragflow/docs/develop/sandbox_spec.md
Zhichang Yu fd11aca8e5 feat: Implement pluggable multi-provider sandbox architecture (#12820)
## Summary

Implement a flexible sandbox provider system supporting both
self-managed (Docker) and SaaS (Aliyun Code Interpreter) backends for
secure code execution in agent workflows.

**Key Changes:**
-  Aliyun Code Interpreter provider using official
`agentrun-sdk>=0.0.16`
-  Self-managed provider with gVisor (runsc) security
-  Arguments parameter support for dynamic code execution
-  Database-only configuration (removed fallback logic)
-  Configuration scripts for quick setup

Issue #12479

## Features

### 🔌 Provider Abstraction Layer

**1. Self-Managed Provider** (`agent/sandbox/providers/self_managed.py`)
- Wraps existing executor_manager HTTP API
- gVisor (runsc) for secure container isolation
- Configurable pool size, timeout, retry logic
- Languages: Python, Node.js, JavaScript
- ⚠️ **Requires**: gVisor installation, Docker, base images

**2. Aliyun Code Interpreter**
(`agent/sandbox/providers/aliyun_codeinterpreter.py`)
- SaaS integration using official agentrun-sdk
- Serverless microVM execution with auto-authentication
- Hard timeout: 30 seconds max
- Credentials: `AGENTRUN_ACCESS_KEY_ID`, `AGENTRUN_ACCESS_KEY_SECRET`,
`AGENTRUN_ACCOUNT_ID`, `AGENTRUN_REGION`
- Automatically wraps code to call `main()` function

**3. E2B Provider** (`agent/sandbox/providers/e2b.py`)
- Placeholder for future integration

### ⚙️ Configuration System

- `conf/system_settings.json`: Default provider =
`aliyun_codeinterpreter`
- `agent/sandbox/client.py`: Enforces database-only configuration
- Admin UI: `/admin/sandbox-settings`
- Configuration validation via `validate_config()` method
- Health checks for all providers

### 🎯 Key Capabilities

**Arguments Parameter Support:**
All providers support passing arguments to `main()` function:
```python
# User code
def main(name: str, count: int) -> dict:
    return {"message": f"Hello {name}!" * count}

# Executed with: arguments={"name": "World", "count": 3}
# Result: {"message": "Hello World!Hello World!Hello World!"}
```

**Self-Describing Providers:**
Each provider implements `get_config_schema()` returning form
configuration for Admin UI

**Error Handling:**
Structured `ExecutionResult` with stdout, stderr, exit_code,
execution_time

## Configuration Scripts

Two scripts for quick Aliyun sandbox setup:

**Shell Script (requires jq):**
```bash
source scripts/configure_aliyun_sandbox.sh
```

**Python Script (interactive):**
```bash
python3 scripts/configure_aliyun_sandbox.py
```

## Testing

```bash
# Unit tests
uv run pytest agent/sandbox/tests/test_providers.py -v

# Aliyun provider tests
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v

# Integration tests (requires credentials)
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v

# Quick SDK validation
python3 agent/sandbox/tests/verify_sdk.py
```

**Test Coverage:**
- 30 unit tests for provider abstraction
- Provider-specific tests for Aliyun
- Integration tests with real API
- Security tests for executor_manager

## Documentation

- `docs/develop/sandbox_spec.md` - Complete architecture specification
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration from legacy
sandbox
- `agent/sandbox/tests/QUICKSTART.md` - Quick start guide
- `agent/sandbox/tests/README.md` - Testing documentation

## Breaking Changes

⚠️ **Migration Required:**

1. **Directory Move**: `sandbox/` → `agent/sandbox/`
   - Update imports: `from sandbox.` → `from agent.sandbox.`

2. **Mandatory Configuration**: 
   - SystemSettings must have `sandbox.provider_type` configured
   - Removed fallback default values
- Configuration must exist in database (from
`conf/system_settings.json`)

3. **Aliyun Credentials**:
   - Requires `AGENTRUN_*` environment variables (not `ALIYUN_*`)
   - `AGENTRUN_ACCOUNT_ID` is now required (Aliyun primary account ID)

4. **Self-Managed Provider**:
   - gVisor (runsc) must be installed for security
   - Install: `go install gvisor.dev/gvisor/runsc@latest`

## Database Schema Changes

```python
# SystemSettings.value: CharField → TextField
api/db/db_models.py: Changed for unlimited config length

# SystemSettingsService.get_by_name(): Fixed query precision
api/db/services/system_settings_service.py: startswith → exact match
```

## Files Changed

### Backend (Python)
- `agent/sandbox/providers/base.py` - SandboxProvider ABC interface
- `agent/sandbox/providers/manager.py` - ProviderManager
- `agent/sandbox/providers/self_managed.py` - Self-managed provider
- `agent/sandbox/providers/aliyun_codeinterpreter.py` - Aliyun provider
- `agent/sandbox/providers/e2b.py` - E2B provider (placeholder)
- `agent/sandbox/client.py` - Unified client (enforces DB-only config)
- `agent/tools/code_exec.py` - Updated to use provider system
- `admin/server/services.py` - SandboxMgr with registry & validation
- `admin/server/routes.py` - 5 sandbox API endpoints
- `conf/system_settings.json` - Default: aliyun_codeinterpreter
- `api/db/db_models.py` - TextField for SystemSettings.value
- `api/db/services/system_settings_service.py` - Exact match query

### Frontend (TypeScript/React)
- `web/src/pages/admin/sandbox-settings.tsx` - Settings UI
- `web/src/services/admin-service.ts` - Sandbox service functions
- `web/src/services/admin.service.d.ts` - Type definitions
- `web/src/utils/api.ts` - Sandbox API endpoints

### Documentation
- `docs/develop/sandbox_spec.md` - Architecture spec
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration guide
- `agent/sandbox/tests/QUICKSTART.md` - Quick start
- `agent/sandbox/tests/README.md` - Testing guide

### Configuration Scripts
- `scripts/configure_aliyun_sandbox.sh` - Shell script (jq)
- `scripts/configure_aliyun_sandbox.py` - Python script

### Tests
- `agent/sandbox/tests/test_providers.py` - 30 unit tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter.py` - Provider tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py` -
Integration tests
- `agent/sandbox/tests/verify_sdk.py` - SDK validation

## Architecture

```
Admin UI → Admin API → SandboxMgr → ProviderManager → [SelfManaged|Aliyun|E2B]
                                      ↓
                                  SystemSettings
```

## Usage

### 1. Configure Provider

**Via Admin UI:**
1. Navigate to `/admin/sandbox-settings`
2. Select provider (Aliyun Code Interpreter / Self-Managed)
3. Fill in configuration
4. Click "Test Connection" to verify
5. Click "Save" to apply

**Via Configuration Scripts:**
```bash
# Aliyun provider
export AGENTRUN_ACCESS_KEY_ID="xxx"
export AGENTRUN_ACCESS_KEY_SECRET="yyy"
export AGENTRUN_ACCOUNT_ID="zzz"
export AGENTRUN_REGION="cn-shanghai"
source scripts/configure_aliyun_sandbox.sh
```

### 2. Restart Service

```bash
cd docker
docker compose restart ragflow-server
```

### 3. Execute Code in Agent

```python
from agent.sandbox.client import execute_code

result = execute_code(
    code='def main(name: str) -> dict: return {"message": f"Hello {name}!"}',
    language="python",
    timeout=30,
    arguments={"name": "World"}
)

print(result.stdout)  # {"message": "Hello World!"}
```

## Troubleshooting

### "Container pool is busy" (Self-Managed)
- **Cause**: Pool exhausted (default: 1 container in `.env`)
- **Fix**: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` to 5+

### "Sandbox provider type not configured"
- **Cause**: Database missing configuration
- **Fix**: Run config script or set via Admin UI

### "gVisor not found"
- **Cause**: runsc not installed
- **Fix**: `go install gvisor.dev/gvisor/runsc@latest && sudo cp
~/go/bin/runsc /usr/local/bin/`

### Aliyun authentication errors
- **Cause**: Wrong environment variable names
- **Fix**: Use `AGENTRUN_*` prefix (not `ALIYUN_*`)

## Checklist

- [x] All tests passing (30 unit tests + integration tests)
- [x] Documentation updated (spec, migration guide, quickstart)
- [x] Type definitions added (TypeScript)
- [x] Admin UI implemented
- [x] Configuration validation
- [x] Health checks implemented
- [x] Error handling with structured results
- [x] Breaking changes documented
- [x] Configuration scripts created
- [x] gVisor requirements documented

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 13:28:21 +08:00

1838 lines
56 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# RAGFlow Sandbox Multi-Provider Architecture - Design Specification
## 1. Overview
### 1.1 Goals
Enable RAGFlow to support multiple sandbox deployment modes:
- **Self-Managed**: On-premise deployment using Daytona/Docker (current implementation)
- **SaaS Providers**: Cloud-based sandbox services (Aliyun Code Interpreter, E2B)
### 1.2 Key Requirements
- Provider-agnostic interface for sandbox operations
- Admin-configurable provider settings with dynamic schema
- Multi-tenant isolation (1:1 session-to-sandbox mapping)
- Graceful fallback and error handling
- Unified monitoring and observability
## 2. Architecture Design
### 2.1 Provider Abstraction Layer
**Location**: `agent/sandbox/providers/`
Define a unified `SandboxProvider` interface:
```python
# agent/sandbox/providers/base.py
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional
from dataclasses import dataclass
@dataclass
class SandboxInstance:
instance_id: str
provider: str
status: str # running, stopped, error
metadata: Dict[str, Any]
@dataclass
class ExecutionResult:
stdout: str
stderr: str
exit_code: int
execution_time: float
metadata: Dict[str, Any]
class SandboxProvider(ABC):
"""Base interface for all sandbox providers"""
@abstractmethod
def initialize(self, config: Dict[str, Any]) -> bool:
"""Initialize provider with configuration"""
pass
@abstractmethod
def create_instance(self, template: str = "python") -> SandboxInstance:
"""Create a new sandbox instance"""
pass
@abstractmethod
def execute_code(
self,
instance_id: str,
code: str,
language: str,
timeout: int = 10
) -> ExecutionResult:
"""Execute code in the sandbox"""
pass
@abstractmethod
def destroy_instance(self, instance_id: str) -> bool:
"""Destroy a sandbox instance"""
pass
@abstractmethod
def health_check(self) -> bool:
"""Check if provider is healthy"""
pass
@abstractmethod
def get_supported_languages(self) -> list[str]:
"""Get list of supported programming languages"""
pass
@staticmethod
def get_config_schema() -> Dict[str, Dict]:
"""
Return configuration schema for this provider.
Returns a dictionary mapping field names to their schema definitions,
including type, required status, validation rules, labels, and descriptions.
"""
pass
def validate_config(self, config: Dict[str, Any]) -> tuple[bool, Optional[str]]:
"""
Validate provider-specific configuration.
This method allows providers to implement custom validation logic beyond
the basic schema validation. Override this method to add provider-specific
checks like URL format validation, API key format validation, etc.
Args:
config: Configuration dictionary to validate
Returns:
Tuple of (is_valid, error_message):
- is_valid: True if configuration is valid, False otherwise
- error_message: Error message if invalid, None if valid
"""
# Default implementation: no custom validation
return True, None
```
### 2.2 Provider Implementations
#### 2.2.1 Self-Managed Provider
**File**: `agent/sandbox/providers/self_managed.py`
Wraps the existing executor_manager implementation.
**Prerequisites**:
- **gVisor (runsc)**: Required for secure container isolation. Install with:
```bash
go install gvisor.dev/gvisor/runsc@latest
sudo cp ~/go/bin/runsc /usr/local/bin/
runsc --version
```
Or download from: https://github.com/google/gvisor/releases
- **Docker**: Docker runtime with gVisor support
- **Base Images**: Pull sandbox base images:
```bash
docker pull infiniflow/sandbox-base-python:latest
docker pull infiniflow/sandbox-base-nodejs:latest
```
**Configuration**: Docker API endpoint, pool size, resource limits
- `endpoint`: HTTP endpoint (default: "http://localhost:9385")
- `timeout`: Request timeout in seconds (default: 30)
- `max_retries`: Maximum retry attempts (default: 3)
- `pool_size`: Container pool size (default: 10)
**Languages**: Python, Node.js, JavaScript
**Security**: gVisor (runsc runtime), seccomp, read-only filesystem, memory limits
**Advantages**:
- Low latency (<90ms), data privacy, full control
- No per-execution costs
- Supports `arguments` parameter for passing data to `main()` function
**Limitations**:
- Operational overhead, finite resources
- Requires gVisor installation for security
- Pool exhaustion causes "Container pool is busy" errors
**Common Issues**:
- **"Container pool is busy"**: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` (default: 1 in .env, should be 5+)
- **Container creation fails**: Ensure gVisor is installed and accessible at `/usr/local/bin/runsc`
#### 2.2.2 Aliyun Code Interpreter Provider
**File**: `agent/sandbox/providers/aliyun_codeinterpreter.py`
SaaS integration with Aliyun Function Compute Code Interpreter service using the official agentrun-sdk.
**Official Resources**:
- API Documentation: https://help.aliyun.com/zh/functioncompute/fc/sandbox-sandbox-code-interepreter
- Official SDK: https://github.com/Serverless-Devs/agentrun-sdk-python
- SDK Docs: https://docs.agent.run
**Implementation**:
- Uses official `agentrun-sdk` package
- SDK handles authentication (AccessKey signature) automatically
- Supports environment variable configuration
- Structured error handling with `ServerError` exceptions
**Configuration**:
- `access_key_id`: Aliyun AccessKey ID
- `access_key_secret`: Aliyun AccessKey Secret
- `account_id`: Aliyun primary account ID (主账号ID) - Required for API calls
- `region`: Region (cn-hangzhou, cn-beijing, cn-shanghai, cn-shenzhen, cn-guangzhou)
- `template_name`: Optional sandbox template name for pre-configured environments
- `timeout`: Execution timeout (max 30 seconds - hard limit)
**Languages**: Python, JavaScript
**Security**: Serverless microVM isolation, 30-second hard timeout limit
**Advantages**:
- Official SDK with automatic signature handling
- Unlimited scalability, no maintenance
- China region support with low latency
- Built-in file system management
- Support for execution contexts (Jupyter kernel)
- Context-based execution for state persistence
**Limitations**:
- Network dependency
- 30-second execution time limit (hard limit)
- Pay-as-you-go costs
- Requires Aliyun primary account ID for API calls
**Setup Instructions - Creating a RAM User with Minimal Privileges**:
⚠️ **Security Warning**: Never use your Aliyun primary account (root account) AccessKey for SDK operations. Primary accounts have full resource permissions, and leaked credentials pose significant security risks.
**Step 1: Create a RAM User**
1. Log in to [RAM Console](https://ram.console.aliyun.com/)
2. Navigate to **People** → **Users**
3. Click **Create User**
4. Configure the user:
- **Username**: e.g., `ragflow-sandbox-user`
- **Display Name**: e.g., `RAGFlow Sandbox Service Account`
- **Access Mode**: Check ✅ **OpenAPI/Programmatic Access** (this creates an AccessKey)
- **Console Login**: Optional (not needed for SDK-only access)
5. Click **OK** and save the AccessKey ID and Secret immediately (displayed only once!)
**Step 2: Create a Custom Authorization Policy**
Navigate to **Permissions** → **Policies** → **Create Policy** → **Custom Policy** → **Configuration Script (JSON)**
Choose one of the following policy options based on your security requirements:
**Option A: Minimal Privilege Policy (Recommended)**
Grants only the permissions required by the AgentRun SDK:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"agentrun:CreateTemplate",
"agentrun:GetTemplate",
"agentrun:UpdateTemplate",
"agentrun:DeleteTemplate",
"agentrun:ListTemplates",
"agentrun:CreateSandbox",
"agentrun:GetSandbox",
"agentrun:DeleteSandbox",
"agentrun:StopSandbox",
"agentrun:ListSandboxes",
"agentrun:CreateContext",
"agentrun:ExecuteCode",
"agentrun:DeleteContext",
"agentrun:ListContexts",
"agentrun:CreateFile",
"agentrun:GetFile",
"agentrun:DeleteFile",
"agentrun:ListFiles",
"agentrun:CreateProcess",
"agentrun:GetProcess",
"agentrun:KillProcess",
"agentrun:ListProcesses",
"agentrun:CreateRecording",
"agentrun:GetRecording",
"agentrun:DeleteRecording",
"agentrun:ListRecordings",
"agentrun:CheckHealth"
],
"Resource": [
"acs:agentrun:*:{account_id}:template/*",
"acs:agentrun:*:{account_id}:sandbox/*"
]
}
]
}
```
> Replace `{account_id}` with your Aliyun primary account ID
**Option B: Resource-Level Privilege Control (Most Secure)**
Limits access to specific resource prefixes:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"agentrun:CreateTemplate",
"agentrun:GetTemplate",
"agentrun:UpdateTemplate",
"agentrun:DeleteTemplate",
"agentrun:ListTemplates"
],
"Resource": "acs:agentrun:*:{account_id}:template/ragflow-*"
},
{
"Effect": "Allow",
"Action": [
"agentrun:CreateSandbox",
"agentrun:GetSandbox",
"agentrun:DeleteSandbox",
"agentrun:StopSandbox",
"agentrun:ListSandboxes",
"agentrun:CheckHealth"
],
"Resource": "acs:agentrun:*:{account_id}:sandbox/*"
},
{
"Effect": "Allow",
"Action": ["agentrun:*"],
"Resource": "acs:agentrun:*:{account_id}:sandbox/*/context/*"
},
{
"Effect": "Allow",
"Action": ["agentrun:*"],
"Resource": "acs:agentrun:*:{account_id}:sandbox/*/file/*"
},
{
"Effect": "Allow",
"Action": ["agentrun:*"],
"Resource": "acs:agentrun:*:{account_id}:sandbox/*/process/*"
},
{
"Effect": "Allow",
"Action": ["agentrun:*"],
"Resource": "acs:agentrun:*:{account_id}:sandbox/*/recording/*"
}
]
}
```
> This limits template creation to only those prefixed with `ragflow-*`
**Option C: Full Access (Not Recommended for Production)**
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "agentrun:*",
"Resource": "*"
}
]
}
```
**Step 3: Authorize the RAM User**
1. Return to **Users** list
2. Find the user you just created (e.g., `ragflow-sandbox-user`)
3. Click **Add Permissions** in the Actions column
4. In the **Custom Policy** tab, select the policy you created in Step 2
5. Click **OK**
**Step 4: Configure RAGFlow with the RAM User Credentials**
After creating the RAM user and obtaining the AccessKey, configure it in RAGFlow's admin settings or environment variables:
```bash
# Method 1: Environment variables (for development/testing)
export AGENTRUN_ACCESS_KEY_ID="LTAI5t..." # RAM user's AccessKey ID
export AGENTRUN_ACCESS_KEY_SECRET="xxx..." # RAM user's AccessKey Secret
export AGENTRUN_ACCOUNT_ID="123456789..." # Your primary account ID
export AGENTRUN_REGION="cn-hangzhou"
```
Or via Admin UI (recommended for production):
1. Navigate to **Admin Settings** → **Sandbox Providers**
2. Select **Aliyun Code Interpreter** provider
3. Fill in the configuration:
- `access_key_id`: RAM user's AccessKey ID
- `access_key_secret`: RAM user's AccessKey Secret
- `account_id`: Your primary account ID
- `region`: e.g., `cn-hangzhou`
**Step 5: Verify Permissions**
Test if the RAM user permissions are correctly configured:
```python
from agentrun.sandbox import Sandbox, TemplateInput, TemplateType
try:
# Test template creation
template = Sandbox.create_template(
input=TemplateInput(
template_name="ragflow-permission-test",
template_type=TemplateType.CODE_INTERPRETER
)
)
print("✅ RAM user permissions are correctly configured")
except Exception as e:
print(f"❌ Permission test failed: {e}")
finally:
# Cleanup test resources
try:
Sandbox.delete_template("ragflow-permission-test")
except:
pass
```
**Security Best Practices**:
1. ✅ **Always use RAM user AccessKeys**, never primary account AccessKeys
2. ✅ **Follow the principle of least privilege** - grant only necessary permissions
3. ✅ **Rotate AccessKeys regularly** - recommend every 3-6 months
4. ✅ **Enable MFA** - enable multi-factor authentication for RAM users
5. ✅ **Use secure storage** - store credentials in environment variables or secret management services, never hardcode in code
6. ✅ **Restrict IP access** - add IP whitelist policies for RAM users if needed
7. ✅ **Monitor access logs** - regularly check RAM user access logs in CloudTrail
**Reference Links**:
- [Aliyun RAM Documentation](https://help.aliyun.com/product/28625.html)
- [RAM Policy Language](https://help.aliyun.com/document_detail/100676.html)
- [AgentRun Official Documentation](https://docs.agent.run)
- [AgentRun SDK GitHub](https://github.com/Serverless-Devs/agentrun-sdk-python)
#### 2.2.3 E2B Provider
**File**: `agent/sandbox/providers/e2b.py`
SaaS integration with E2B Cloud.
- **Configuration**: api_key, region (us/eu)
- **Languages**: Python, JavaScript, Go, Bash, etc.
- **Security**: Firecracker microVMs
- **Advantages**: Global CDN, fast startup, multiple language support
- **Limitations**: International network latency for China users
### 2.3 Provider Management
**File**: `agent/sandbox/providers/manager.py`
Since we only use one active provider at a time (configured globally), the provider management is simplified:
```python
class ProviderManager:
"""Manages the currently active sandbox provider"""
def __init__(self):
self.current_provider: Optional[SandboxProvider] = None
self.current_provider_name: Optional[str] = None
def set_provider(self, name: str, provider: SandboxProvider):
"""Set the active provider"""
self.current_provider = provider
self.current_provider_name = name
def get_provider(self) -> Optional[SandboxProvider]:
"""Get the active provider"""
return self.current_provider
def get_provider_name(self) -> Optional[str]:
"""Get the active provider name"""
return self.current_provider_name
```
**Rationale**: With global configuration, there's only one active provider at a time. The provider manager simply holds a reference to the currently active provider, making it a thin wrapper rather than a complex multi-provider manager.
## 3. Admin Configuration
### 3.1 Database Schema
Use the existing **SystemSettings** table for global sandbox configuration:
```python
# In api/db/db_models.py
class SystemSettings(DataBaseModel):
name = CharField(max_length=128, primary_key=True)
source = CharField(max_length=32, null=False, index=False)
data_type = CharField(max_length=32, null=False, index=False)
value = CharField(max_length=1024, null=False, index=False)
```
**Rationale**: Sandbox manager is a **system-level service** shared by all tenants:
- No per-tenant configuration needed (unlike LLM providers where each tenant has their own API keys)
- Global settings like system email, DOC_ENGINE, etc.
- Managed by administrators only
- Leverages existing `SettingsMgr` in admin interface
**Storage Strategy**: Each provider's configuration stored as a **single JSON object**:
- `sandbox.provider_type` - Active provider selection ("self_managed", "aliyun_codeinterpreter", "e2b")
- `sandbox.self_managed` - JSON config for self-managed provider
- `sandbox.aliyun_codeinterpreter` - JSON config for Aliyun Code Interpreter provider
- `sandbox.e2b` - JSON config for E2B provider
**Note**: The `value` field has a 1024 character limit, which should be sufficient for typical sandbox configurations. If larger configs are needed, consider using a TextField or a separate configuration table.
### 3.2 Configuration Schema
Each provider's configuration is stored as a **single JSON object** in the `value` field:
#### Self-Managed Provider
```json
{
"name": "sandbox.self_managed",
"source": "variable",
"data_type": "json",
"value": "{\"endpoint\": \"http://localhost:9385\", \"pool_size\": 10, \"max_memory\": \"256m\", \"timeout\": 30}"
}
```
#### Aliyun Code Interpreter
```json
{
"name": "sandbox.aliyun_codeinterpreter",
"source": "variable",
"data_type": "json",
"value": "{\"access_key_id\": \"LTAI5t...\", \"access_key_secret\": \"xxxxx\", \"account_id\": \"1234567890...\", \"region\": \"cn-hangzhou\", \"timeout\": 30}"
}
```
#### E2B
```json
{
"name": "sandbox.e2b",
"source": "variable",
"data_type": "json",
"value": "{\"api_key\": \"e2b_sk_...\", \"region\": \"us\", \"timeout\": 30}"
}
```
#### Active Provider Selection
```json
{
"name": "sandbox.provider_type",
"source": "variable",
"data_type": "string",
"value": "self_managed"
}
```
### 3.3 Provider Self-Describing Schema
Each provider class implements a static method to describe its configuration schema:
```python
# agent/sandbox/providers/base.py
class SandboxProvider(ABC):
"""Base interface for all sandbox providers"""
@abstractmethod
def initialize(self, config: Dict[str, Any]) -> bool:
"""Initialize provider with configuration"""
pass
@abstractmethod
def create_instance(self, template: str = "python") -> SandboxInstance:
"""Create a new sandbox instance"""
pass
@abstractmethod
def execute_code(
self,
instance_id: str,
code: str,
language: str,
timeout: int = 10
) -> ExecutionResult:
"""Execute code in the sandbox"""
pass
@abstractmethod
def destroy_instance(self, instance_id: str) -> bool:
"""Destroy a sandbox instance"""
pass
@abstractmethod
def health_check(self) -> bool:
"""Check if provider is healthy"""
pass
@abstractmethod
def get_supported_languages(self) -> list[str]:
"""Get list of supported programming languages"""
pass
@staticmethod
def get_config_schema() -> Dict[str, Dict]:
"""Return configuration schema for this provider"""
return {}
```
**Example Implementation**:
```python
# agent/sandbox/providers/self_managed.py
class SelfManagedProvider(SandboxProvider):
@staticmethod
def get_config_schema() -> Dict[str, Dict]:
return {
"endpoint": {
"type": "string",
"required": True,
"label": "API Endpoint",
"placeholder": "http://localhost:9385"
},
"pool_size": {
"type": "integer",
"default": 10,
"label": "Container Pool Size",
"min": 1,
"max": 100
},
"max_memory": {
"type": "string",
"default": "256m",
"label": "Max Memory per Container",
"options": ["128m", "256m", "512m", "1g"]
},
"timeout": {
"type": "integer",
"default": 30,
"label": "Execution Timeout (seconds)",
"min": 5,
"max": 300
}
}
# agent/sandbox/providers/aliyun_codeinterpreter.py
class AliyunCodeInterpreterProvider(SandboxProvider):
@staticmethod
def get_config_schema() -> Dict[str, Dict]:
return {
"access_key_id": {
"type": "string",
"required": True,
"secret": True,
"label": "Access Key ID",
"description": "Aliyun AccessKey ID for authentication"
},
"access_key_secret": {
"type": "string",
"required": True,
"secret": True,
"label": "Access Key Secret",
"description": "Aliyun AccessKey Secret for authentication"
},
"account_id": {
"type": "string",
"required": True,
"label": "Account ID",
"description": "Aliyun primary account ID (主账号ID), required for API calls"
},
"region": {
"type": "string",
"default": "cn-hangzhou",
"label": "Region",
"options": ["cn-hangzhou", "cn-beijing", "cn-shanghai", "cn-shenzhen", "cn-guangzhou"],
"description": "Aliyun region for Code Interpreter service"
},
"template_name": {
"type": "string",
"required": False,
"label": "Template Name",
"description": "Optional sandbox template name for pre-configured environments"
},
"timeout": {
"type": "integer",
"default": 30,
"label": "Execution Timeout (seconds)",
"min": 1,
"max": 30,
"description": "Code execution timeout (max 30 seconds - hard limit)"
}
}
# agent/sandbox/providers/e2b.py
class E2BProvider(SandboxProvider):
@staticmethod
def get_config_schema() -> Dict[str, Dict]:
return {
"api_key": {
"type": "string",
"required": True,
"secret": True,
"label": "API Key"
},
"region": {
"type": "string",
"default": "us",
"label": "Region",
"options": ["us", "eu"]
},
"timeout": {
"type": "integer",
"default": 30,
"label": "Execution Timeout (seconds)",
"min": 5,
"max": 300
}
}
```
**Benefits of Self-Describing Providers**:
- Single source of truth - schema defined alongside implementation
- Easy to add new providers - no central registry to update
- Type safety - schema stays in sync with provider code
- Flexible - frontend can use schema for validation or hardcode if preferred
### 3.4 Admin API Endpoints
Follow existing pattern in `admin/server/routes.py` and use `SettingsMgr`:
```python
# admin/server/routes.py (add new endpoints)
from flask import request, jsonify
import json
from api.db.services.system_settings_service import SystemSettingsService
from agent.agent.sandbox.providers.self_managed import SelfManagedProvider
from agent.agent.sandbox.providers.aliyun_codeinterpreter import AliyunCodeInterpreterProvider
from agent.agent.sandbox.providers.e2b import E2BProvider
from admin.server.services import SettingsMgr
# Map provider IDs to their classes
PROVIDER_CLASSES = {
"self_managed": SelfManagedProvider,
"aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
"e2b": E2BProvider,
}
@admin_bp.route('/api/admin/sandbox/providers', methods=['GET'])
def list_sandbox_providers():
"""List available sandbox providers with their schemas"""
providers = []
for provider_id, provider_class in PROVIDER_CLASSES.items():
schema = provider_class.get_config_schema()
providers.append({
"id": provider_id,
"name": provider_id.replace("_", " ").title(),
"config_schema": schema
})
return jsonify({"data": providers})
@admin_bp.route('/api/admin/sandbox/config', methods=['GET'])
def get_sandbox_config():
"""Get current sandbox configuration"""
# Get active provider
active_provider_setting = SystemSettingsService.get_by_name("sandbox.provider_type")
active_provider = active_provider_setting[0].value if active_provider_setting else None
config = {"active": active_provider}
# Load all provider configs
for provider_id in PROVIDER_CLASSES.keys():
setting = SystemSettingsService.get_by_name(f"sandbox.{provider_id}")
if setting:
try:
config[provider_id] = json.loads(setting[0].value)
except json.JSONDecodeError:
config[provider_id] = {}
else:
# Return default values from schema
provider_class = PROVIDER_CLASSES[provider_id]
schema = provider_class.get_config_schema()
config[provider_id] = {
key: field_def.get("default", "")
for key, field_def in schema.items()
}
return jsonify({"data": config})
@admin_bp.route('/api/admin/sandbox/config', methods=['POST'])
def set_sandbox_config():
"""
Update sandbox provider configuration.
Request Parameters:
- provider_type: Provider identifier (e.g., "self_managed", "e2b")
- config: Provider configuration dictionary
- set_active: (optional) If True, also set this provider as active.
Default: True for backward compatibility.
Set to False to update config without switching providers.
- test_connection: (optional) If True, test connection before saving
Response: Success message
"""
req = request.json
provider_type = req.get('provider_type')
config = req.get('config')
set_active = req.get('set_active', True) # Default to True
# Validate provider exists
if provider_type not in PROVIDER_CLASSES:
return jsonify({"error": "Unknown provider"}), 400
# Validate configuration against schema
provider_class = PROVIDER_CLASSES[provider_type]
schema = provider_class.get_config_schema()
validation_result = validate_config(config, schema)
if not validation_result.valid:
return jsonify({"error": "Invalid config", "details": validation_result.errors}), 400
# Test connection if requested
if req.get('test_connection'):
test_result = test_provider_connection(provider_type, config)
if not test_result.success:
return jsonify({"error": "Connection failed", "details": test_result.error}), 400
# Store entire config as a single JSON record
config_json = json.dumps(config)
setting_name = f"sandbox.{provider_type}"
existing = SystemSettingsService.get_by_name(setting_name)
if existing:
SettingsMgr.update_by_name(setting_name, config_json)
else:
SystemSettingsService.save(
name=setting_name,
source="variable",
data_type="json",
value=config_json
)
# Set as active provider if requested (default: True)
if set_active:
SettingsMgr.update_by_name("sandbox.provider_type", provider_type)
return jsonify({"message": "Configuration saved"})
@admin_bp.route('/api/admin/sandbox/test', methods=['POST'])
def test_sandbox_connection():
"""Test connection to sandbox provider"""
provider_type = request.json.get('provider_type')
config = request.json.get('config')
test_result = test_provider_connection(provider_type, config)
return jsonify({
"success": test_result.success,
"message": test_result.message,
"latency_ms": test_result.latency_ms
})
@admin_bp.route('/api/admin/sandbox/active', methods=['PUT'])
def set_active_sandbox_provider():
"""Set active sandbox provider"""
provider_name = request.json.get('provider')
if provider_name not in PROVIDER_CLASSES:
return jsonify({"error": "Unknown provider"}), 400
# Check if provider is configured
provider_setting = SystemSettingsService.get_by_name(f"sandbox.{provider_name}")
if not provider_setting:
return jsonify({"error": "Provider not configured"}), 400
SettingsMgr.update_by_name("sandbox.provider_type", provider_name)
return jsonify({"message": "Active provider updated"})
```
## 4. Frontend Integration
### 4.1 Admin Settings UI
**Location**: `web/src/pages/SandboxSettings/index.tsx`
```typescript
import { Form, Select, Input, Button, Card, Space, Tag, message } from 'antd';
import { listSandboxProviders, getSandboxConfig, setSandboxConfig, testSandboxConnection } from '@/utils/api';
const SandboxSettings: React.FC = () => {
const [providers, setProviders] = useState<Provider[]>([]);
const [configs, setConfigs] = useState<Config[]>([]);
const [selectedProvider, setSelectedProvider] = useState<string>('');
const [testing, setTesting] = useState(false);
const providerSchema = providers.find(p => p.id === selectedProvider);
const renderConfigForm = () => {
if (!providerSchema) return null;
return (
<Form layout="vertical">
{Object.entries(providerSchema.config_schema).map(([key, schema]) => (
<Form.Item
key={key}
name={key}
label={schema.label}
rules={[{ required: schema.required }]}
>
{schema.secret ? (
<Input.Password placeholder={schema.placeholder} />
) : schema.type === 'integer' ? (
<InputNumber min={schema.min} max={schema.max} />
) : schema.options ? (
<Select>
{schema.options.map((opt: string) => (
<Option key={opt} value={opt}>{opt}</Option>
))}
</Select>
) : (
<Input placeholder={schema.placeholder} />
)}
</Form.Item>
))}
</Form>
);
};
return (
<Card title="Sandbox Provider Configuration">
<Space direction="vertical" style={{ width: '100%' }}>
{/* Provider Selection */}
<Form.Item label="Select Provider">
<Select
style={{ width: '100%' }}
onChange={setSelectedProvider}
value={selectedProvider}
>
{providers.map(provider => (
<Option key={provider.id} value={provider.id}>
<Space>
<Icon type={provider.icon} />
{provider.name}
{provider.tags.map(tag => (
<Tag key={tag}>{tag}</Tag>
))}
</Space>
</Option>
))}
</Select>
</Form.Item>
{/* Dynamic Configuration Form */}
{renderConfigForm()}
{/* Actions */}
<Space>
<Button type="primary" onClick={handleSave}>
Save Configuration
</Button>
<Button onClick={handleTest} loading={testing}>
Test Connection
</Button>
</Space>
</Space>
</Card>
);
};
```
### 4.2 API Client
**File**: `web/src/utils/api.ts`
```typescript
export async function listSandboxProviders() {
return request<{ data: Provider[] }>('/api/admin/sandbox/providers');
}
export async function getSandboxConfig() {
return request<{ data: SandboxConfig }>('/api/admin/sandbox/config');
}
export async function setSandboxConfig(config: SandboxConfigRequest) {
return request('/api/admin/sandbox/config', {
method: 'POST',
data: config,
});
}
export async function testSandboxConnection(provider: string, config: any) {
return request('/api/admin/sandbox/test', {
method: 'POST',
data: { provider, config },
});
}
export async function setActiveSandboxProvider(provider: string) {
return request('/api/admin/sandbox/active', {
method: 'PUT',
data: { provider },
});
}
```
### 4.3 Type Definitions
**File**: `web/src/types/sandbox.ts`
```typescript
interface Provider {
id: string;
name: string;
description: string;
icon: string;
tags: string[];
config_schema: Record<string, ConfigField>;
supported_languages: string[];
}
interface ConfigField {
type: 'string' | 'integer' | 'boolean';
required: boolean;
secret?: boolean;
label: string;
placeholder?: string;
default?: any;
options?: string[];
min?: number;
max?: number;
}
// Configuration response grouped by provider
interface SandboxConfig {
active: string; // Currently active provider
self_managed?: Record<string, string>;
aliyun_codeinterpreter?: Record<string, string>;
e2b?: Record<string, string>;
// Add more providers as needed
}
// Request to update provider configuration
interface SandboxConfigRequest {
provider_type: string;
config: Record<string, string | number | boolean>;
test_connection?: boolean;
set_active?: boolean;
}
```
## 5. Integration with Agent System
### 5.1 Agent Component Usage
The agent system will use the sandbox through the simplified provider manager, loading global configuration from SystemSettings:
```python
# In agent/components/code_executor.py
import json
from agent.agent.sandbox.providers.manager import ProviderManager
from agent.agent.sandbox.providers.self_managed import SelfManagedProvider
from agent.agent.sandbox.providers.aliyun_codeinterpreter import AliyunCodeInterpreterProvider
from agent.agent.sandbox.providers.e2b import E2BProvider
from api.db.services.system_settings_service import SystemSettingsService
# Map provider IDs to their classes
PROVIDER_CLASSES = {
"self_managed": SelfManagedProvider,
"aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
"e2b": E2BProvider,
}
class CodeExecutorComponent:
def __init__(self):
self.provider_manager = ProviderManager()
self._load_active_provider()
def _load_active_provider(self):
"""Load the active provider from system settings"""
# Get active provider
active_setting = SystemSettingsService.get_by_name("sandbox.provider_type")
if not active_setting:
raise RuntimeError("No sandbox provider configured")
active_provider = active_setting[0].value
# Load configuration for active provider (single JSON record)
provider_setting = SystemSettingsService.get_by_name(f"sandbox.{active_provider}")
if not provider_setting:
raise RuntimeError(f"Sandbox provider {active_provider} not configured")
# Parse JSON configuration
try:
config = json.loads(provider_setting[0].value)
except json.JSONDecodeError as e:
raise RuntimeError(f"Invalid sandbox configuration for {active_provider}: {e}")
# Get provider class
provider_class = PROVIDER_CLASSES.get(active_provider)
if not provider_class:
raise RuntimeError(f"Unknown provider: {active_provider}")
# Initialize provider
provider = provider_class()
provider.initialize(config)
# Set as active provider in manager
self.provider_manager.set_provider(active_provider, provider)
def execute(self, code: str, language: str) -> ExecutionResult:
"""Execute code using the active provider"""
provider = self.provider_manager.get_provider()
if not provider:
raise RuntimeError("No sandbox provider configured")
# Create instance
instance = provider.create_instance(template=language)
try:
# Execute code
result = provider.execute_code(
instance_id=instance.instance_id,
code=code,
language=language
)
return result
finally:
# Always cleanup
provider.destroy_instance(instance.instance_id)
```
## 6. Security Considerations
### 6.1 Credential Storage
- Sensitive credentials (API keys, secrets) encrypted at rest in database
- Use RAGFlow's existing encryption mechanisms (AES-256)
- Never log or expose credentials in error messages or API responses
- Credentials redacted in UI (show only last 4 characters)
### 6.2 Tenant Isolation
- **Configuration**: Global sandbox settings shared by all tenants (admin-only access)
- **Execution**: Sandboxes never shared across tenants/sessions during runtime
- **Instance IDs**: Scoped to tenant: `{tenant_id}:{session_id}:{instance_id}`
- **Network Isolation**: Between tenant sandboxes (VPC per tenant for SaaS providers)
- **Resource Quotas**: Per-tenant limits on concurrent executions, total execution time
- **Audit Logging**: All sandbox executions logged with tenant_id for traceability
### 6.3 Resource Limits
- Timeout limits per execution (configurable per provider, default 30s)
- Memory/CPU limits enforced at provider level
- Automatic cleanup of stale instances (max lifetime: 5 minutes)
- Rate limiting per tenant (max concurrent executions: 10)
### 6.4 Code Security
- For self-managed: AST-based security analysis before execution
- Blocked operations: file system writes, network calls, system commands
- Allowlist approach: only specific imports allowed
- Runtime monitoring for malicious patterns
### 6.5 Network Security
- Self-managed: Network isolation by default, no external access
- SaaS: HTTPS only, certificate pinning
- IP whitelisting for self-managed endpoint access
## 7. Monitoring and Observability
### 7.1 Metrics to Track
**Common Metrics (All Providers)**:
- Execution success rate (target: >95%)
- Average execution time (p50, p95, p99)
- Error rate by error type
- Active instance count
- Queue depth (for self-managed pool)
**Self-Managed Specific**:
- Container pool utilization (target: 60-80%)
- Host resource usage (CPU, memory, disk)
- Container creation latency
- Container restart rate
- gVisor runtime health
**SaaS Specific**:
- API call latency by region
- Rate limit usage and throttling events
- Cost estimation (execution count × unit cost)
- Provider availability (uptime %)
- API error rate by error code
### 7.2 Logging
Structured logging for all provider operations:
```json
{
"timestamp": "2025-01-26T10:00:00Z",
"tenant_id": "tenant_123",
"provider": "aliyun_codeinterpreter",
"operation": "execute_code",
"instance_id": "inst_xyz",
"language": "python",
"code_hash": "sha256:...",
"duration_ms": 1234,
"status": "success",
"exit_code": 0,
"memory_used_mb": 64,
"region": "cn-hangzhou"
}
```
### 7.3 Alerts
**Critical Alerts**:
- Provider availability < 99%
- Error rate > 5%
- Average execution time > 10s
- Container pool exhaustion (0 available)
**Warning Alerts**:
- Cost spike (2x daily average)
- Rate limit approaching (>80%)
- High memory usage (>90%)
- Slow execution times (p95 > 5s)
## 8. Migration Path
### 8.1 Phase 1: Refactor Existing Code (Week 1-2)
**Goals**: Extract current implementation into provider pattern
**Tasks**:
- [ ] Create `agent/sandbox/providers/base.py` with `SandboxProvider` interface
- [ ] Implement `agent/sandbox/providers/self_managed.py` wrapping executor_manager
- [ ] Create `agent/sandbox/providers/manager.py` for provider management
- [ ] Write unit tests for self-managed provider
- [ ] Document existing behavior and configuration
**Deliverables**:
- Provider abstraction layer
- Self-managed provider implementation
- Unit test suite
### 8.2 Phase 2: Database Integration (Week 3)
**Goals**: Add sandbox configuration to admin system
**Tasks**:
- [ ] Add sandbox entries to `conf/system_settings.json` initialization file
- [ ] Extend `SettingsMgr` in `admin/server/services.py` with sandbox-specific methods
- [ ] Add admin endpoints to `admin/server/routes.py`
- [ ] Implement configuration validation logic
- [ ] Add provider connection testing
- [ ] Write API tests
**Deliverables**:
- SystemSettings integration
- Admin API endpoints (`/api/admin/sandbox/*`)
- Configuration validation
- API test suite
### 8.3 Phase 3: Frontend UI (Week 4)
**Goals**: Build admin settings interface
**Tasks**:
- [ ] Create `web/src/pages/SandboxSettings/index.tsx`
- [ ] Implement dynamic form generation from provider schema
- [ ] Add connection testing UI
- [ ] Create TypeScript types
- [ ] Write frontend tests
**Deliverables**:
- Admin settings UI
- Type definitions
- Frontend test suite
### 8.4 Phase 4: SaaS Provider Implementation (Week 5-6)
**Goals**: Implement Aliyun Code Interpreter and E2B providers
**Tasks**:
- [ ] Implement `agent/sandbox/providers/aliyun_codeinterpreter.py`
- [ ] Implement `agent/sandbox/providers/e2b.py`
- [ ] Add provider-specific tests with mocking
- [ ] Document provider-specific behaviors
- [ ] Create provider setup guides
**Deliverables**:
- Aliyun Code Interpreter provider
- E2B provider
- Provider documentation
### 8.5 Phase 5: Agent Integration (Week 7)
**Goals**: Update agent components to use new provider system
**Tasks**:
- [ ] Update `agent/components/code_executor.py` to use ProviderManager
- [ ] Implement fallback logic
- [ ] Add tenant-specific provider loading
- [ ] Update agent tests
- [ ] Performance testing
**Deliverables**:
- Agent integration
- Fallback mechanism
- Updated test suite
### 8.6 Phase 6: Monitoring & Documentation (Week 8)
**Goals**: Add observability and complete documentation
**Tasks**:
- [ ] Implement metrics collection
- [ ] Add structured logging
- [ ] Configure alerts
- [ ] Write deployment guide
- [ ] Write user documentation
- [ ] Create troubleshooting guide
**Deliverables**:
- Monitoring dashboards
- Complete documentation
- Deployment guides
## 9. Testing Strategy
### 9.1 Unit Tests
**Provider Tests** (`test/agent/sandbox/providers/test_*.py`):
```python
class TestSelfManagedProvider:
def test_initialize_with_config():
provider = SelfManagedProvider()
assert provider.initialize({"endpoint": "http://localhost:9385"})
def test_create_python_instance():
provider = SelfManagedProvider()
provider.initialize(test_config)
instance = provider.create_instance("python")
assert instance.status == "running"
def test_execute_code():
provider = SelfManagedProvider()
result = provider.execute_code(instance_id, "print('hello')", "python")
assert result.exit_code == 0
assert "hello" in result.stdout
```
**Configuration Tests**:
- Test configuration validation for each provider schema
- Test error handling for invalid configurations
- Test secret field redaction
### 9.2 Integration Tests
**Provider Switching**:
- Test switching between providers
- Test fallback mechanism
- Test concurrent provider usage
**Multi-Tenant Isolation**:
- Test tenant configuration isolation
- Test instance ID scoping
- Test resource separation
**Admin API Tests**:
- Test CRUD operations for configurations
- Test connection testing endpoint
- Test validation error responses
### 9.3 E2E Tests
**Complete Flow Tests**:
```python
def test_sandbox_execution_flow():
# 1. Configure provider via admin API
setSandboxConfig(provider="self_managed", config={...})
# 2. Create agent task with code execution
task = create_agent_task(code="print('test')")
# 3. Execute task
result = execute_agent_task(task.id)
# 4. Verify result
assert result.status == "success"
assert "test" in result.output
# 5. Verify sandbox cleanup
assert get_active_instances() == 0
```
**Admin UI Tests**:
- Test provider configuration flow
- Test connection testing
- Test error handling in UI
### 9.4 Performance Tests
**Load Testing**:
- Test 100 concurrent executions
- Test pool exhaustion behavior
- Test queue performance (self-managed)
**Latency Testing**:
- Measure cold start time per provider
- Measure execution latency percentiles
- Compare provider performance
## 10. Cost Considerations
### 10.1 Self-Managed Costs
**Infrastructure**:
- Server hosting: $X/month (depends on specs)
- Maintenance: engineering time
- Scaling: manual, requires additional servers
**Pros**:
- Predictable costs
- No per-execution fees
- Full control over resources
**Cons**:
- High initial setup cost
- Operational overhead
- Finite capacity
### 10.2 SaaS Costs
**Aliyun Code Interpreter** (estimated):
- Pricing: execution time × memory configuration
- Example: 1000 executions/day × 30s × $0.01/1000s = ~$0.30/day
**E2B** (estimated):
- Pricing: $0.02/execution-second
- Example: 1000 executions/day × 30s × $0.02/s = ~$600/day
**Pros**:
- No upfront costs
- Automatic scaling
- No maintenance
**Cons**:
- Variable costs (can spike with usage)
- Network dependency
- Potential for runaway costs
### 10.3 Cost Optimization
**Recommendations**:
1. **Hybrid Approach**: Use self-managed for base load, SaaS for spikes
2. **Cost Monitoring**: Set budget alerts per tenant
3. **Resource Limits**: Enforce max executions per tenant/day
4. **Caching**: Reuse instances when possible (self-managed pool)
5. **Smart Routing**: Route to cheapest provider based on availability
## 11. Future Extensibility
The architecture supports easy addition of new providers:
### 11.1 Adding a New Provider
**Step 1**: Implement provider class with schema
```python
# agent/sandbox/providers/new_provider.py
from .base import SandboxProvider
class NewProvider(SandboxProvider):
@staticmethod
def get_config_schema() -> Dict[str, Dict]:
return {
"api_key": {
"type": "string",
"required": True,
"secret": True,
"label": "API Key"
},
"region": {
"type": "string",
"default": "us-east-1",
"label": "Region"
}
}
def initialize(self, config: Dict[str, Any]) -> bool:
self.api_key = config.get("api_key")
self.region = config.get("region", "us-east-1")
# Initialize client
return True
# Implement other abstract methods...
```
**Step 2**: Register in provider mapping
```python
# In api/apps/sandbox_app.py or wherever providers are listed
from agent.agent.sandbox.providers.new_provider import NewProvider
PROVIDER_CLASSES = {
"self_managed": SelfManagedProvider,
"aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
"e2b": E2BProvider,
"new_provider": NewProvider, # Add here
}
```
**No central registry to update** - just import and add to the mapping!
### 11.2 Potential Future Providers
- **GitHub Codespaces**: For GitHub-integrated workflows
- **Gitpod**: Cloud development environments
- **CodeSandbox**: Frontend code execution
- **AWS Firecracker**: Raw microVM management
- **Custom Provider**: User-defined provider implementations
### 11.3 Advanced Features
**Feature Pooling**:
- Share instances across executions (same language, same user)
- Warm pool for reduced latency
- Instance hibernation for cost savings
**Feature Multi-Region**:
- Route to nearest region
- Failover across regions
- Regional cost optimization
**Feature Hybrid Execution**:
- Split workloads between providers
- Dynamic provider selection based on cost/performance
- A/B testing for provider performance
## 12. Appendix
### 12.1 Configuration Examples
**SystemSettings Initialization File** (`conf/system_settings.json` - add these entries):
```json
{
"system_settings": [
{
"name": "sandbox.provider_type",
"source": "variable",
"data_type": "string",
"value": "self_managed"
},
{
"name": "sandbox.self_managed",
"source": "variable",
"data_type": "json",
"value": "{\"endpoint\": \"http://sandbox-internal:9385\", \"pool_size\": 20, \"max_memory\": \"512m\", \"timeout\": 60, \"enable_seccomp\": true, \"enable_ast_analysis\": true}"
},
{
"name": "sandbox.aliyun_codeinterpreter",
"source": "variable",
"data_type": "json",
"value": "{\"access_key_id\": \"\", \"access_key_secret\": \"\", \"account_id\": \"\", \"region\": \"cn-hangzhou\", \"template_name\": \"\", \"timeout\": 30}"
},
{
"name": "sandbox.e2b",
"source": "variable",
"data_type": "json",
"value": "{\"api_key\": \"\", \"region\": \"us\", \"timeout\": 30}"
}
]
}
```
**Admin API Request Example** (POST to `/api/admin/sandbox/config`):
```json
{
"provider_type": "self_managed",
"config": {
"endpoint": "http://sandbox-internal:9385",
"pool_size": 20,
"max_memory": "512m",
"timeout": 60,
"enable_seccomp": true,
"enable_ast_analysis": true
},
"test_connection": true,
"set_active": true
}
```
**Note**: The `config` object in the request is a plain JSON object. The API will serialize it to a JSON string before storing in SystemSettings.
**Admin API Response Example** (GET from `/api/admin/sandbox/config`):
```json
{
"data": {
"active": "self_managed",
"self_managed": {
"endpoint": "http://sandbox-internal:9385",
"pool_size": 20,
"max_memory": "512m",
"timeout": 60,
"enable_seccomp": true,
"enable_ast_analysis": true
},
"aliyun_codeinterpreter": {
"access_key_id": "",
"access_key_secret": "",
"region": "cn-hangzhou",
"workspace_id": ""
},
"e2b": {
"api_key": "",
"region": "us",
"timeout": 30
}
}
}
```
**Note**: The response deserializes the JSON strings back to objects for easier frontend consumption.
### 12.2 Error Codes
| Code | Description | Resolution |
|------|-------------|------------|
| SB001 | Provider not initialized | Configure provider in admin |
| SB002 | Invalid configuration | Check configuration values |
| SB003 | Connection failed | Check network and credentials |
| SB004 | Instance creation failed | Check provider capacity |
| SB005 | Execution timeout | Increase timeout or optimize code |
| SB006 | Out of memory | Reduce memory usage or increase limits |
| SB007 | Code blocked by security policy | Remove blocked imports/operations |
| SB008 | Rate limit exceeded | Reduce concurrency or upgrade plan |
| SB009 | Provider unavailable | Check provider status or use fallback |
### 12.3 References
- [Current Sandbox Implementation](../sandbox/README.md)
- [RAGFlow Admin System](../CONTRIBUTING.md)
- [Daytona Documentation](https://daytona.dev/docs)
- [Aliyun Code Interpreter](https://help.aliyun.com/...)
- [E2B Documentation](https://e2b.dev/docs)
---
**Document Version**: 1.0
**Last Updated**: 2025-01-26
**Author**: RAGFlow Team
**Status**: Design Specification - Ready for Review
## Appendix C: Configuration Storage Considerations
### Current Implementation
- **Storage**: SystemSettings table with `value` field as `TextField` (unlimited length)
- **Migration**: Database migration added to convert from `CharField(1024)` to `TextField`
- **Benefit**: Supports arbitrarily long API keys, workspace IDs, and other SaaS provider credentials
### Validation
- **Schema validation**: Type checking, range validation, required field validation
- **Provider-specific validation**: Custom validation via `validate_config()` method
- **Example**: SelfManagedProvider validates URL format, timeout ranges, pool size constraints
### Configuration Storage Format
Each provider's configuration is stored as JSON in `SystemSettings.value`:
- `sandbox.provider_type`: Active provider selection
- `sandbox.self_managed`: Self-managed provider JSON config
- `sandbox.aliyun_codeinterpreter`: Aliyun provider JSON config
- `sandbox.e2b`: E2B provider JSON config
## Appendix D: Configuration Hot Reload Limitations
### Current Behavior
**Provider Configuration Requires Restart**: When switching sandbox providers in the admin panel, the ragflow service must be restarted for changes to take effect.
**Reason**:
- Admin and ragflow are separate processes
- ragflow loads sandbox provider configuration only at startup
- The `get_provider_manager()` function caches the provider globally
- Configuration changes in MySQL are not automatically detected
**Impact**:
- Switching from `self_managed` → `aliyun_codeinterpreter` requires ragflow restart
- Updating credentials/config requires ragflow restart
- Not a dynamic configuration system
**Workarounds**:
1. **Production**: Restart ragflow service after configuration changes:
```bash
cd docker
docker compose restart ragflow-server
```
2. **Development**: Use the `reload_provider()` function in code:
```python
from agent.sandbox.client import reload_provider
reload_provider() # Reloads from MySQL settings
```
**Future Enhancement**:
To support hot reload without restart, implement configuration change detection:
```python
# In agent/sandbox/client.py
_config_timestamp: Optional[int] = None
def get_provider_manager() -> ProviderManager:
global _provider_manager, _config_timestamp
# Check if configuration has changed
setting = SystemSettingsService.get_by_name("sandbox.provider_type")
current_timestamp = setting[0].update_time if setting else 0
if _config_timestamp is None or current_timestamp > _config_timestamp:
# Configuration changed, reload provider
_provider_manager = None
_load_provider_from_settings()
_config_timestamp = current_timestamp
return _provider_manager
```
However, this adds overhead on every `execute_code()` call. For production use, explicit restart is preferred for simplicity and reliability.
## Appendix E: Arguments Parameter Support
### Overview
All sandbox providers support passing arguments to the `main()` function in user code. This enables dynamic parameter injection for code execution.
### Implementation Details
**Base Interface**:
```python
# agent/sandbox/providers/base.py
@abstractmethod
def execute_code(
self,
instance_id: str,
code: str,
language: str,
timeout: int = 10,
arguments: Optional[Dict[str, Any]] = None
) -> ExecutionResult:
"""
Execute code in the sandbox.
The code should contain a main() function that will be called with:
- Python: main(**arguments) if arguments provided, else main()
- JavaScript: main(arguments) if arguments provided, else main()
"""
pass
```
**Provider Implementations**:
1. **Self-Managed Provider** ([self_managed.py:164](agent/sandbox/providers/self_managed.py:164)):
- Passes arguments via HTTP API: `"arguments": arguments or {}`
- executor_manager receives and passes to code via command line
- Runner script: `args = json.loads(sys.argv[1])` then `result = main(**args)`
2. **Aliyun Code Interpreter** ([aliyun_codeinterpreter.py:260-275](agent/sandbox/providers/aliyun_codeinterpreter.py:260-275)):
- Wraps user code to call `main(**arguments)` or `main()` if no arguments
- Python example:
```python
if arguments:
wrapped_code = f'''{code}
if __name__ == "__main__":
import json
result = main(**{json.dumps(arguments)})
print(json.dumps(result) if isinstance(result, dict) else result)
'''
```
- JavaScript example:
```javascript
if arguments:
wrapped_code = f'''{code}
const result = main({json.dumps(arguments)});
console.log(typeof result === 'object' ? JSON.stringify(result) : String(result));
'''
```
**Client Layer** ([client.py:138-190](agent/sandbox/client.py:138-190)):
```python
def execute_code(
code: str,
language: str = "python",
timeout: int = 30,
arguments: Optional[Dict[str, Any]] = None
) -> ExecutionResult:
provider_manager = get_provider_manager()
provider = provider_manager.get_provider()
instance = provider.create_instance(template=language)
try:
result = provider.execute_code(
instance_id=instance.instance_id,
code=code,
language=language,
timeout=timeout,
arguments=arguments # Passed through to provider
)
return result
finally:
provider.destroy_instance(instance.instance_id)
```
**CodeExec Tool Integration** ([code_exec.py:136-165](agent/tools/code_exec.py:136-165)):
```python
def _execute_code(self, language: str, code: str, arguments: dict):
# ... collect arguments from component configuration
result = sandbox_execute_code(
code=code,
language=language,
timeout=int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10 * 60)),
arguments=arguments # Passed through to sandbox client
)
```
### Usage Examples
**Python Code with Arguments**:
```python
# User code
def main(name: str, count: int) -> dict:
"""Generate greeting"""
return {"message": f"Hello {name}!" * count}
# Called with: arguments={"name": "World", "count": 3}
# Result: {"message": "Hello World!Hello World!Hello World!"}
```
**JavaScript Code with Arguments**:
```javascript
// User code
function main(args) {
const { name, count } = args;
return `Hello ${name}!`.repeat(count);
}
// Called with: arguments={"name": "World", "count": 3}
// Result: "Hello World!Hello World!Hello World!"
```
### Important Notes
1. **Function Signature**: Code MUST define a `main()` function
- Python: `def main(**kwargs)` or `def main()` if no arguments
- JavaScript: `function main(args)` or `function main()` if no arguments
2. **Type Consistency**: Arguments are passed as JSON, so types are preserved:
- Numbers → int/float
- Strings → str
- Booleans → bool
- Objects → dict (Python) / object (JavaScript)
- Arrays → list (Python) / array (JavaScript)
3. **Return Value**: Return value is serialized as JSON for parsing
- Python: `print(json.dumps(result))` if dict
- JavaScript: `console.log(JSON.stringify(result))` if object
4. **Provider Alignment**: All providers (self_managed, aliyun_codeinterpreter, e2b) implement arguments passing consistently