Files
ragflow/docs/develop/sandbox_spec.md
Zhichang Yu fd11aca8e5 feat: Implement pluggable multi-provider sandbox architecture (#12820)
## Summary

Implement a flexible sandbox provider system supporting both
self-managed (Docker) and SaaS (Aliyun Code Interpreter) backends for
secure code execution in agent workflows.

**Key Changes:**
-  Aliyun Code Interpreter provider using official
`agentrun-sdk>=0.0.16`
-  Self-managed provider with gVisor (runsc) security
-  Arguments parameter support for dynamic code execution
-  Database-only configuration (removed fallback logic)
-  Configuration scripts for quick setup

Issue #12479

## Features

### 🔌 Provider Abstraction Layer

**1. Self-Managed Provider** (`agent/sandbox/providers/self_managed.py`)
- Wraps existing executor_manager HTTP API
- gVisor (runsc) for secure container isolation
- Configurable pool size, timeout, retry logic
- Languages: Python, Node.js, JavaScript
- ⚠️ **Requires**: gVisor installation, Docker, base images

**2. Aliyun Code Interpreter**
(`agent/sandbox/providers/aliyun_codeinterpreter.py`)
- SaaS integration using official agentrun-sdk
- Serverless microVM execution with auto-authentication
- Hard timeout: 30 seconds max
- Credentials: `AGENTRUN_ACCESS_KEY_ID`, `AGENTRUN_ACCESS_KEY_SECRET`,
`AGENTRUN_ACCOUNT_ID`, `AGENTRUN_REGION`
- Automatically wraps code to call `main()` function

**3. E2B Provider** (`agent/sandbox/providers/e2b.py`)
- Placeholder for future integration

### ⚙️ Configuration System

- `conf/system_settings.json`: Default provider =
`aliyun_codeinterpreter`
- `agent/sandbox/client.py`: Enforces database-only configuration
- Admin UI: `/admin/sandbox-settings`
- Configuration validation via `validate_config()` method
- Health checks for all providers

### 🎯 Key Capabilities

**Arguments Parameter Support:**
All providers support passing arguments to `main()` function:
```python
# User code
def main(name: str, count: int) -> dict:
    return {"message": f"Hello {name}!" * count}

# Executed with: arguments={"name": "World", "count": 3}
# Result: {"message": "Hello World!Hello World!Hello World!"}
```

**Self-Describing Providers:**
Each provider implements `get_config_schema()` returning form
configuration for Admin UI

**Error Handling:**
Structured `ExecutionResult` with stdout, stderr, exit_code,
execution_time

## Configuration Scripts

Two scripts for quick Aliyun sandbox setup:

**Shell Script (requires jq):**
```bash
source scripts/configure_aliyun_sandbox.sh
```

**Python Script (interactive):**
```bash
python3 scripts/configure_aliyun_sandbox.py
```

## Testing

```bash
# Unit tests
uv run pytest agent/sandbox/tests/test_providers.py -v

# Aliyun provider tests
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v

# Integration tests (requires credentials)
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v

# Quick SDK validation
python3 agent/sandbox/tests/verify_sdk.py
```

**Test Coverage:**
- 30 unit tests for provider abstraction
- Provider-specific tests for Aliyun
- Integration tests with real API
- Security tests for executor_manager

## Documentation

- `docs/develop/sandbox_spec.md` - Complete architecture specification
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration from legacy
sandbox
- `agent/sandbox/tests/QUICKSTART.md` - Quick start guide
- `agent/sandbox/tests/README.md` - Testing documentation

## Breaking Changes

⚠️ **Migration Required:**

1. **Directory Move**: `sandbox/` → `agent/sandbox/`
   - Update imports: `from sandbox.` → `from agent.sandbox.`

2. **Mandatory Configuration**: 
   - SystemSettings must have `sandbox.provider_type` configured
   - Removed fallback default values
- Configuration must exist in database (from
`conf/system_settings.json`)

3. **Aliyun Credentials**:
   - Requires `AGENTRUN_*` environment variables (not `ALIYUN_*`)
   - `AGENTRUN_ACCOUNT_ID` is now required (Aliyun primary account ID)

4. **Self-Managed Provider**:
   - gVisor (runsc) must be installed for security
   - Install: `go install gvisor.dev/gvisor/runsc@latest`

## Database Schema Changes

```python
# SystemSettings.value: CharField → TextField
api/db/db_models.py: Changed for unlimited config length

# SystemSettingsService.get_by_name(): Fixed query precision
api/db/services/system_settings_service.py: startswith → exact match
```

## Files Changed

### Backend (Python)
- `agent/sandbox/providers/base.py` - SandboxProvider ABC interface
- `agent/sandbox/providers/manager.py` - ProviderManager
- `agent/sandbox/providers/self_managed.py` - Self-managed provider
- `agent/sandbox/providers/aliyun_codeinterpreter.py` - Aliyun provider
- `agent/sandbox/providers/e2b.py` - E2B provider (placeholder)
- `agent/sandbox/client.py` - Unified client (enforces DB-only config)
- `agent/tools/code_exec.py` - Updated to use provider system
- `admin/server/services.py` - SandboxMgr with registry & validation
- `admin/server/routes.py` - 5 sandbox API endpoints
- `conf/system_settings.json` - Default: aliyun_codeinterpreter
- `api/db/db_models.py` - TextField for SystemSettings.value
- `api/db/services/system_settings_service.py` - Exact match query

### Frontend (TypeScript/React)
- `web/src/pages/admin/sandbox-settings.tsx` - Settings UI
- `web/src/services/admin-service.ts` - Sandbox service functions
- `web/src/services/admin.service.d.ts` - Type definitions
- `web/src/utils/api.ts` - Sandbox API endpoints

### Documentation
- `docs/develop/sandbox_spec.md` - Architecture spec
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration guide
- `agent/sandbox/tests/QUICKSTART.md` - Quick start
- `agent/sandbox/tests/README.md` - Testing guide

### Configuration Scripts
- `scripts/configure_aliyun_sandbox.sh` - Shell script (jq)
- `scripts/configure_aliyun_sandbox.py` - Python script

### Tests
- `agent/sandbox/tests/test_providers.py` - 30 unit tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter.py` - Provider tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py` -
Integration tests
- `agent/sandbox/tests/verify_sdk.py` - SDK validation

## Architecture

```
Admin UI → Admin API → SandboxMgr → ProviderManager → [SelfManaged|Aliyun|E2B]
                                      ↓
                                  SystemSettings
```

## Usage

### 1. Configure Provider

**Via Admin UI:**
1. Navigate to `/admin/sandbox-settings`
2. Select provider (Aliyun Code Interpreter / Self-Managed)
3. Fill in configuration
4. Click "Test Connection" to verify
5. Click "Save" to apply

**Via Configuration Scripts:**
```bash
# Aliyun provider
export AGENTRUN_ACCESS_KEY_ID="xxx"
export AGENTRUN_ACCESS_KEY_SECRET="yyy"
export AGENTRUN_ACCOUNT_ID="zzz"
export AGENTRUN_REGION="cn-shanghai"
source scripts/configure_aliyun_sandbox.sh
```

### 2. Restart Service

```bash
cd docker
docker compose restart ragflow-server
```

### 3. Execute Code in Agent

```python
from agent.sandbox.client import execute_code

result = execute_code(
    code='def main(name: str) -> dict: return {"message": f"Hello {name}!"}',
    language="python",
    timeout=30,
    arguments={"name": "World"}
)

print(result.stdout)  # {"message": "Hello World!"}
```

## Troubleshooting

### "Container pool is busy" (Self-Managed)
- **Cause**: Pool exhausted (default: 1 container in `.env`)
- **Fix**: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` to 5+

### "Sandbox provider type not configured"
- **Cause**: Database missing configuration
- **Fix**: Run config script or set via Admin UI

### "gVisor not found"
- **Cause**: runsc not installed
- **Fix**: `go install gvisor.dev/gvisor/runsc@latest && sudo cp
~/go/bin/runsc /usr/local/bin/`

### Aliyun authentication errors
- **Cause**: Wrong environment variable names
- **Fix**: Use `AGENTRUN_*` prefix (not `ALIYUN_*`)

## Checklist

- [x] All tests passing (30 unit tests + integration tests)
- [x] Documentation updated (spec, migration guide, quickstart)
- [x] Type definitions added (TypeScript)
- [x] Admin UI implemented
- [x] Configuration validation
- [x] Health checks implemented
- [x] Error handling with structured results
- [x] Breaking changes documented
- [x] Configuration scripts created
- [x] gVisor requirements documented

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 13:28:21 +08:00

56 KiB
Raw Blame History

RAGFlow Sandbox Multi-Provider Architecture - Design Specification

1. Overview

1.1 Goals

Enable RAGFlow to support multiple sandbox deployment modes:

  • Self-Managed: On-premise deployment using Daytona/Docker (current implementation)
  • SaaS Providers: Cloud-based sandbox services (Aliyun Code Interpreter, E2B)

1.2 Key Requirements

  • Provider-agnostic interface for sandbox operations
  • Admin-configurable provider settings with dynamic schema
  • Multi-tenant isolation (1:1 session-to-sandbox mapping)
  • Graceful fallback and error handling
  • Unified monitoring and observability

2. Architecture Design

2.1 Provider Abstraction Layer

Location: agent/sandbox/providers/

Define a unified SandboxProvider interface:

# agent/sandbox/providers/base.py
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional
from dataclasses import dataclass

@dataclass
class SandboxInstance:
    instance_id: str
    provider: str
    status: str  # running, stopped, error
    metadata: Dict[str, Any]

@dataclass
class ExecutionResult:
    stdout: str
    stderr: str
    exit_code: int
    execution_time: float
    metadata: Dict[str, Any]

class SandboxProvider(ABC):
    """Base interface for all sandbox providers"""

    @abstractmethod
    def initialize(self, config: Dict[str, Any]) -> bool:
        """Initialize provider with configuration"""
        pass

    @abstractmethod
    def create_instance(self, template: str = "python") -> SandboxInstance:
        """Create a new sandbox instance"""
        pass

    @abstractmethod
    def execute_code(
        self,
        instance_id: str,
        code: str,
        language: str,
        timeout: int = 10
    ) -> ExecutionResult:
        """Execute code in the sandbox"""
        pass

    @abstractmethod
    def destroy_instance(self, instance_id: str) -> bool:
        """Destroy a sandbox instance"""
        pass

    @abstractmethod
    def health_check(self) -> bool:
        """Check if provider is healthy"""
        pass

    @abstractmethod
    def get_supported_languages(self) -> list[str]:
        """Get list of supported programming languages"""
        pass

    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        """
        Return configuration schema for this provider.

        Returns a dictionary mapping field names to their schema definitions,
        including type, required status, validation rules, labels, and descriptions.
        """
        pass

    def validate_config(self, config: Dict[str, Any]) -> tuple[bool, Optional[str]]:
        """
        Validate provider-specific configuration.

        This method allows providers to implement custom validation logic beyond
        the basic schema validation. Override this method to add provider-specific
        checks like URL format validation, API key format validation, etc.

        Args:
            config: Configuration dictionary to validate

        Returns:
            Tuple of (is_valid, error_message):
                - is_valid: True if configuration is valid, False otherwise
                - error_message: Error message if invalid, None if valid
        """
        # Default implementation: no custom validation
        return True, None

2.2 Provider Implementations

2.2.1 Self-Managed Provider

File: agent/sandbox/providers/self_managed.py

Wraps the existing executor_manager implementation.

Prerequisites:

  • gVisor (runsc): Required for secure container isolation. Install with:
    go install gvisor.dev/gvisor/runsc@latest
    sudo cp ~/go/bin/runsc /usr/local/bin/
    runsc --version
    
    Or download from: https://github.com/google/gvisor/releases
  • Docker: Docker runtime with gVisor support
  • Base Images: Pull sandbox base images:
    docker pull infiniflow/sandbox-base-python:latest
    docker pull infiniflow/sandbox-base-nodejs:latest
    

Configuration: Docker API endpoint, pool size, resource limits

  • endpoint: HTTP endpoint (default: "http://localhost:9385")
  • timeout: Request timeout in seconds (default: 30)
  • max_retries: Maximum retry attempts (default: 3)
  • pool_size: Container pool size (default: 10)

Languages: Python, Node.js, JavaScript

Security: gVisor (runsc runtime), seccomp, read-only filesystem, memory limits

Advantages:

  • Low latency (<90ms), data privacy, full control
  • No per-execution costs
  • Supports arguments parameter for passing data to main() function

Limitations:

  • Operational overhead, finite resources
  • Requires gVisor installation for security
  • Pool exhaustion causes "Container pool is busy" errors

Common Issues:

  • "Container pool is busy": Increase SANDBOX_EXECUTOR_MANAGER_POOL_SIZE (default: 1 in .env, should be 5+)
  • Container creation fails: Ensure gVisor is installed and accessible at /usr/local/bin/runsc

2.2.2 Aliyun Code Interpreter Provider

File: agent/sandbox/providers/aliyun_codeinterpreter.py

SaaS integration with Aliyun Function Compute Code Interpreter service using the official agentrun-sdk.

Official Resources:

Implementation:

  • Uses official agentrun-sdk package
  • SDK handles authentication (AccessKey signature) automatically
  • Supports environment variable configuration
  • Structured error handling with ServerError exceptions

Configuration:

  • access_key_id: Aliyun AccessKey ID
  • access_key_secret: Aliyun AccessKey Secret
  • account_id: Aliyun primary account ID (主账号ID) - Required for API calls
  • region: Region (cn-hangzhou, cn-beijing, cn-shanghai, cn-shenzhen, cn-guangzhou)
  • template_name: Optional sandbox template name for pre-configured environments
  • timeout: Execution timeout (max 30 seconds - hard limit)

Languages: Python, JavaScript

Security: Serverless microVM isolation, 30-second hard timeout limit

Advantages:

  • Official SDK with automatic signature handling
  • Unlimited scalability, no maintenance
  • China region support with low latency
  • Built-in file system management
  • Support for execution contexts (Jupyter kernel)
  • Context-based execution for state persistence

Limitations:

  • Network dependency
  • 30-second execution time limit (hard limit)
  • Pay-as-you-go costs
  • Requires Aliyun primary account ID for API calls

Setup Instructions - Creating a RAM User with Minimal Privileges:

⚠️ Security Warning: Never use your Aliyun primary account (root account) AccessKey for SDK operations. Primary accounts have full resource permissions, and leaked credentials pose significant security risks.

Step 1: Create a RAM User

  1. Log in to RAM Console
  2. Navigate to PeopleUsers
  3. Click Create User
  4. Configure the user:
    • Username: e.g., ragflow-sandbox-user
    • Display Name: e.g., RAGFlow Sandbox Service Account
    • Access Mode: Check OpenAPI/Programmatic Access (this creates an AccessKey)
    • Console Login: Optional (not needed for SDK-only access)
  5. Click OK and save the AccessKey ID and Secret immediately (displayed only once!)

Step 2: Create a Custom Authorization Policy

Navigate to PermissionsPoliciesCreate PolicyCustom PolicyConfiguration Script (JSON)

Choose one of the following policy options based on your security requirements:

Option A: Minimal Privilege Policy (Recommended)

Grants only the permissions required by the AgentRun SDK:

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "agentrun:CreateTemplate",
        "agentrun:GetTemplate",
        "agentrun:UpdateTemplate",
        "agentrun:DeleteTemplate",
        "agentrun:ListTemplates",
        "agentrun:CreateSandbox",
        "agentrun:GetSandbox",
        "agentrun:DeleteSandbox",
        "agentrun:StopSandbox",
        "agentrun:ListSandboxes",
        "agentrun:CreateContext",
        "agentrun:ExecuteCode",
        "agentrun:DeleteContext",
        "agentrun:ListContexts",
        "agentrun:CreateFile",
        "agentrun:GetFile",
        "agentrun:DeleteFile",
        "agentrun:ListFiles",
        "agentrun:CreateProcess",
        "agentrun:GetProcess",
        "agentrun:KillProcess",
        "agentrun:ListProcesses",
        "agentrun:CreateRecording",
        "agentrun:GetRecording",
        "agentrun:DeleteRecording",
        "agentrun:ListRecordings",
        "agentrun:CheckHealth"
      ],
      "Resource": [
        "acs:agentrun:*:{account_id}:template/*",
        "acs:agentrun:*:{account_id}:sandbox/*"
      ]
    }
  ]
}

Replace {account_id} with your Aliyun primary account ID

Option B: Resource-Level Privilege Control (Most Secure)

Limits access to specific resource prefixes:

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "agentrun:CreateTemplate",
        "agentrun:GetTemplate",
        "agentrun:UpdateTemplate",
        "agentrun:DeleteTemplate",
        "agentrun:ListTemplates"
      ],
      "Resource": "acs:agentrun:*:{account_id}:template/ragflow-*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "agentrun:CreateSandbox",
        "agentrun:GetSandbox",
        "agentrun:DeleteSandbox",
        "agentrun:StopSandbox",
        "agentrun:ListSandboxes",
        "agentrun:CheckHealth"
      ],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*"
    },
    {
      "Effect": "Allow",
      "Action": ["agentrun:*"],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*/context/*"
    },
    {
      "Effect": "Allow",
      "Action": ["agentrun:*"],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*/file/*"
    },
    {
      "Effect": "Allow",
      "Action": ["agentrun:*"],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*/process/*"
    },
    {
      "Effect": "Allow",
      "Action": ["agentrun:*"],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*/recording/*"
    }
  ]
}

This limits template creation to only those prefixed with ragflow-*

Option C: Full Access (Not Recommended for Production)

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "agentrun:*",
      "Resource": "*"
    }
  ]
}

Step 3: Authorize the RAM User

  1. Return to Users list
  2. Find the user you just created (e.g., ragflow-sandbox-user)
  3. Click Add Permissions in the Actions column
  4. In the Custom Policy tab, select the policy you created in Step 2
  5. Click OK

Step 4: Configure RAGFlow with the RAM User Credentials

After creating the RAM user and obtaining the AccessKey, configure it in RAGFlow's admin settings or environment variables:

# Method 1: Environment variables (for development/testing)
export AGENTRUN_ACCESS_KEY_ID="LTAI5t..."  # RAM user's AccessKey ID
export AGENTRUN_ACCESS_KEY_SECRET="xxx..."  # RAM user's AccessKey Secret
export AGENTRUN_ACCOUNT_ID="123456789..."  # Your primary account ID
export AGENTRUN_REGION="cn-hangzhou"

Or via Admin UI (recommended for production):

  1. Navigate to Admin SettingsSandbox Providers
  2. Select Aliyun Code Interpreter provider
  3. Fill in the configuration:
    • access_key_id: RAM user's AccessKey ID
    • access_key_secret: RAM user's AccessKey Secret
    • account_id: Your primary account ID
    • region: e.g., cn-hangzhou

Step 5: Verify Permissions

Test if the RAM user permissions are correctly configured:

from agentrun.sandbox import Sandbox, TemplateInput, TemplateType

try:
    # Test template creation
    template = Sandbox.create_template(
        input=TemplateInput(
            template_name="ragflow-permission-test",
            template_type=TemplateType.CODE_INTERPRETER
        )
    )
    print("✅ RAM user permissions are correctly configured")
except Exception as e:
    print(f"❌ Permission test failed: {e}")
finally:
    # Cleanup test resources
    try:
        Sandbox.delete_template("ragflow-permission-test")
    except:
        pass

Security Best Practices:

  1. Always use RAM user AccessKeys, never primary account AccessKeys
  2. Follow the principle of least privilege - grant only necessary permissions
  3. Rotate AccessKeys regularly - recommend every 3-6 months
  4. Enable MFA - enable multi-factor authentication for RAM users
  5. Use secure storage - store credentials in environment variables or secret management services, never hardcode in code
  6. Restrict IP access - add IP whitelist policies for RAM users if needed
  7. Monitor access logs - regularly check RAM user access logs in CloudTrail

Reference Links:

2.2.3 E2B Provider

File: agent/sandbox/providers/e2b.py

SaaS integration with E2B Cloud.

  • Configuration: api_key, region (us/eu)
  • Languages: Python, JavaScript, Go, Bash, etc.
  • Security: Firecracker microVMs
  • Advantages: Global CDN, fast startup, multiple language support
  • Limitations: International network latency for China users

2.3 Provider Management

File: agent/sandbox/providers/manager.py

Since we only use one active provider at a time (configured globally), the provider management is simplified:

class ProviderManager:
    """Manages the currently active sandbox provider"""

    def __init__(self):
        self.current_provider: Optional[SandboxProvider] = None
        self.current_provider_name: Optional[str] = None

    def set_provider(self, name: str, provider: SandboxProvider):
        """Set the active provider"""
        self.current_provider = provider
        self.current_provider_name = name

    def get_provider(self) -> Optional[SandboxProvider]:
        """Get the active provider"""
        return self.current_provider

    def get_provider_name(self) -> Optional[str]:
        """Get the active provider name"""
        return self.current_provider_name

Rationale: With global configuration, there's only one active provider at a time. The provider manager simply holds a reference to the currently active provider, making it a thin wrapper rather than a complex multi-provider manager.

3. Admin Configuration

3.1 Database Schema

Use the existing SystemSettings table for global sandbox configuration:

# In api/db/db_models.py

class SystemSettings(DataBaseModel):
    name = CharField(max_length=128, primary_key=True)
    source = CharField(max_length=32, null=False, index=False)
    data_type = CharField(max_length=32, null=False, index=False)
    value = CharField(max_length=1024, null=False, index=False)

Rationale: Sandbox manager is a system-level service shared by all tenants:

  • No per-tenant configuration needed (unlike LLM providers where each tenant has their own API keys)
  • Global settings like system email, DOC_ENGINE, etc.
  • Managed by administrators only
  • Leverages existing SettingsMgr in admin interface

Storage Strategy: Each provider's configuration stored as a single JSON object:

  • sandbox.provider_type - Active provider selection ("self_managed", "aliyun_codeinterpreter", "e2b")
  • sandbox.self_managed - JSON config for self-managed provider
  • sandbox.aliyun_codeinterpreter - JSON config for Aliyun Code Interpreter provider
  • sandbox.e2b - JSON config for E2B provider

Note: The value field has a 1024 character limit, which should be sufficient for typical sandbox configurations. If larger configs are needed, consider using a TextField or a separate configuration table.

3.2 Configuration Schema

Each provider's configuration is stored as a single JSON object in the value field:

Self-Managed Provider

{
  "name": "sandbox.self_managed",
  "source": "variable",
  "data_type": "json",
  "value": "{\"endpoint\": \"http://localhost:9385\", \"pool_size\": 10, \"max_memory\": \"256m\", \"timeout\": 30}"
}

Aliyun Code Interpreter

{
  "name": "sandbox.aliyun_codeinterpreter",
  "source": "variable",
  "data_type": "json",
  "value": "{\"access_key_id\": \"LTAI5t...\", \"access_key_secret\": \"xxxxx\", \"account_id\": \"1234567890...\", \"region\": \"cn-hangzhou\", \"timeout\": 30}"
}

E2B

{
  "name": "sandbox.e2b",
  "source": "variable",
  "data_type": "json",
  "value": "{\"api_key\": \"e2b_sk_...\", \"region\": \"us\", \"timeout\": 30}"
}

Active Provider Selection

{
  "name": "sandbox.provider_type",
  "source": "variable",
  "data_type": "string",
  "value": "self_managed"
}

3.3 Provider Self-Describing Schema

Each provider class implements a static method to describe its configuration schema:

# agent/sandbox/providers/base.py

class SandboxProvider(ABC):
    """Base interface for all sandbox providers"""

    @abstractmethod
    def initialize(self, config: Dict[str, Any]) -> bool:
        """Initialize provider with configuration"""
        pass

    @abstractmethod
    def create_instance(self, template: str = "python") -> SandboxInstance:
        """Create a new sandbox instance"""
        pass

    @abstractmethod
    def execute_code(
        self,
        instance_id: str,
        code: str,
        language: str,
        timeout: int = 10
    ) -> ExecutionResult:
        """Execute code in the sandbox"""
        pass

    @abstractmethod
    def destroy_instance(self, instance_id: str) -> bool:
        """Destroy a sandbox instance"""
        pass

    @abstractmethod
    def health_check(self) -> bool:
        """Check if provider is healthy"""
        pass

    @abstractmethod
    def get_supported_languages(self) -> list[str]:
        """Get list of supported programming languages"""
        pass

    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        """Return configuration schema for this provider"""
        return {}

Example Implementation:

# agent/sandbox/providers/self_managed.py

class SelfManagedProvider(SandboxProvider):
    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        return {
            "endpoint": {
                "type": "string",
                "required": True,
                "label": "API Endpoint",
                "placeholder": "http://localhost:9385"
            },
            "pool_size": {
                "type": "integer",
                "default": 10,
                "label": "Container Pool Size",
                "min": 1,
                "max": 100
            },
            "max_memory": {
                "type": "string",
                "default": "256m",
                "label": "Max Memory per Container",
                "options": ["128m", "256m", "512m", "1g"]
            },
            "timeout": {
                "type": "integer",
                "default": 30,
                "label": "Execution Timeout (seconds)",
                "min": 5,
                "max": 300
            }
        }

# agent/sandbox/providers/aliyun_codeinterpreter.py

class AliyunCodeInterpreterProvider(SandboxProvider):
    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        return {
            "access_key_id": {
                "type": "string",
                "required": True,
                "secret": True,
                "label": "Access Key ID",
                "description": "Aliyun AccessKey ID for authentication"
            },
            "access_key_secret": {
                "type": "string",
                "required": True,
                "secret": True,
                "label": "Access Key Secret",
                "description": "Aliyun AccessKey Secret for authentication"
            },
            "account_id": {
                "type": "string",
                "required": True,
                "label": "Account ID",
                "description": "Aliyun primary account ID (主账号ID), required for API calls"
            },
            "region": {
                "type": "string",
                "default": "cn-hangzhou",
                "label": "Region",
                "options": ["cn-hangzhou", "cn-beijing", "cn-shanghai", "cn-shenzhen", "cn-guangzhou"],
                "description": "Aliyun region for Code Interpreter service"
            },
            "template_name": {
                "type": "string",
                "required": False,
                "label": "Template Name",
                "description": "Optional sandbox template name for pre-configured environments"
            },
            "timeout": {
                "type": "integer",
                "default": 30,
                "label": "Execution Timeout (seconds)",
                "min": 1,
                "max": 30,
                "description": "Code execution timeout (max 30 seconds - hard limit)"
            }
        }

# agent/sandbox/providers/e2b.py

class E2BProvider(SandboxProvider):
    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        return {
            "api_key": {
                "type": "string",
                "required": True,
                "secret": True,
                "label": "API Key"
            },
            "region": {
                "type": "string",
                "default": "us",
                "label": "Region",
                "options": ["us", "eu"]
            },
            "timeout": {
                "type": "integer",
                "default": 30,
                "label": "Execution Timeout (seconds)",
                "min": 5,
                "max": 300
            }
        }

Benefits of Self-Describing Providers:

  • Single source of truth - schema defined alongside implementation
  • Easy to add new providers - no central registry to update
  • Type safety - schema stays in sync with provider code
  • Flexible - frontend can use schema for validation or hardcode if preferred

3.4 Admin API Endpoints

Follow existing pattern in admin/server/routes.py and use SettingsMgr:

# admin/server/routes.py (add new endpoints)

from flask import request, jsonify
import json
from api.db.services.system_settings_service import SystemSettingsService
from agent.agent.sandbox.providers.self_managed import SelfManagedProvider
from agent.agent.sandbox.providers.aliyun_codeinterpreter import AliyunCodeInterpreterProvider
from agent.agent.sandbox.providers.e2b import E2BProvider
from admin.server.services import SettingsMgr

# Map provider IDs to their classes
PROVIDER_CLASSES = {
    "self_managed": SelfManagedProvider,
    "aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
    "e2b": E2BProvider,
}

@admin_bp.route('/api/admin/sandbox/providers', methods=['GET'])
def list_sandbox_providers():
    """List available sandbox providers with their schemas"""
    providers = []
    for provider_id, provider_class in PROVIDER_CLASSES.items():
        schema = provider_class.get_config_schema()
        providers.append({
            "id": provider_id,
            "name": provider_id.replace("_", " ").title(),
            "config_schema": schema
        })
    return jsonify({"data": providers})

@admin_bp.route('/api/admin/sandbox/config', methods=['GET'])
def get_sandbox_config():
    """Get current sandbox configuration"""
    # Get active provider
    active_provider_setting = SystemSettingsService.get_by_name("sandbox.provider_type")
    active_provider = active_provider_setting[0].value if active_provider_setting else None

    config = {"active": active_provider}

    # Load all provider configs
    for provider_id in PROVIDER_CLASSES.keys():
        setting = SystemSettingsService.get_by_name(f"sandbox.{provider_id}")
        if setting:
            try:
                config[provider_id] = json.loads(setting[0].value)
            except json.JSONDecodeError:
                config[provider_id] = {}
        else:
            # Return default values from schema
            provider_class = PROVIDER_CLASSES[provider_id]
            schema = provider_class.get_config_schema()
            config[provider_id] = {
                key: field_def.get("default", "")
                for key, field_def in schema.items()
            }

    return jsonify({"data": config})

@admin_bp.route('/api/admin/sandbox/config', methods=['POST'])
def set_sandbox_config():
    """
    Update sandbox provider configuration.

    Request Parameters:
    - provider_type: Provider identifier (e.g., "self_managed", "e2b")
    - config: Provider configuration dictionary
    - set_active: (optional) If True, also set this provider as active.
                  Default: True for backward compatibility.
                  Set to False to update config without switching providers.
    - test_connection: (optional) If True, test connection before saving

    Response: Success message
    """
    req = request.json
    provider_type = req.get('provider_type')
    config = req.get('config')
    set_active = req.get('set_active', True)  # Default to True

    # Validate provider exists
    if provider_type not in PROVIDER_CLASSES:
        return jsonify({"error": "Unknown provider"}), 400

    # Validate configuration against schema
    provider_class = PROVIDER_CLASSES[provider_type]
    schema = provider_class.get_config_schema()
    validation_result = validate_config(config, schema)
    if not validation_result.valid:
        return jsonify({"error": "Invalid config", "details": validation_result.errors}), 400

    # Test connection if requested
    if req.get('test_connection'):
        test_result = test_provider_connection(provider_type, config)
        if not test_result.success:
            return jsonify({"error": "Connection failed", "details": test_result.error}), 400

    # Store entire config as a single JSON record
    config_json = json.dumps(config)
    setting_name = f"sandbox.{provider_type}"

    existing = SystemSettingsService.get_by_name(setting_name)
    if existing:
        SettingsMgr.update_by_name(setting_name, config_json)
    else:
        SystemSettingsService.save(
            name=setting_name,
            source="variable",
            data_type="json",
            value=config_json
        )

    # Set as active provider if requested (default: True)
    if set_active:
        SettingsMgr.update_by_name("sandbox.provider_type", provider_type)

    return jsonify({"message": "Configuration saved"})

@admin_bp.route('/api/admin/sandbox/test', methods=['POST'])
def test_sandbox_connection():
    """Test connection to sandbox provider"""
    provider_type = request.json.get('provider_type')
    config = request.json.get('config')

    test_result = test_provider_connection(provider_type, config)
    return jsonify({
        "success": test_result.success,
        "message": test_result.message,
        "latency_ms": test_result.latency_ms
    })

@admin_bp.route('/api/admin/sandbox/active', methods=['PUT'])
def set_active_sandbox_provider():
    """Set active sandbox provider"""
    provider_name = request.json.get('provider')

    if provider_name not in PROVIDER_CLASSES:
        return jsonify({"error": "Unknown provider"}), 400

    # Check if provider is configured
    provider_setting = SystemSettingsService.get_by_name(f"sandbox.{provider_name}")
    if not provider_setting:
        return jsonify({"error": "Provider not configured"}), 400

    SettingsMgr.update_by_name("sandbox.provider_type", provider_name)
    return jsonify({"message": "Active provider updated"})

4. Frontend Integration

4.1 Admin Settings UI

Location: web/src/pages/SandboxSettings/index.tsx

import { Form, Select, Input, Button, Card, Space, Tag, message } from 'antd';
import { listSandboxProviders, getSandboxConfig, setSandboxConfig, testSandboxConnection } from '@/utils/api';

const SandboxSettings: React.FC = () => {
  const [providers, setProviders] = useState<Provider[]>([]);
  const [configs, setConfigs] = useState<Config[]>([]);
  const [selectedProvider, setSelectedProvider] = useState<string>('');
  const [testing, setTesting] = useState(false);

  const providerSchema = providers.find(p => p.id === selectedProvider);

  const renderConfigForm = () => {
    if (!providerSchema) return null;

    return (
      <Form layout="vertical">
        {Object.entries(providerSchema.config_schema).map(([key, schema]) => (
          <Form.Item
            key={key}
            name={key}
            label={schema.label}
            rules={[{ required: schema.required }]}
          >
            {schema.secret ? (
              <Input.Password placeholder={schema.placeholder} />
            ) : schema.type === 'integer' ? (
              <InputNumber min={schema.min} max={schema.max} />
            ) : schema.options ? (
              <Select>
                {schema.options.map((opt: string) => (
                  <Option key={opt} value={opt}>{opt}</Option>
                ))}
              </Select>
            ) : (
              <Input placeholder={schema.placeholder} />
            )}
          </Form.Item>
        ))}
      </Form>
    );
  };

  return (
    <Card title="Sandbox Provider Configuration">
      <Space direction="vertical" style={{ width: '100%' }}>
        {/* Provider Selection */}
        <Form.Item label="Select Provider">
          <Select
            style={{ width: '100%' }}
            onChange={setSelectedProvider}
            value={selectedProvider}
          >
            {providers.map(provider => (
              <Option key={provider.id} value={provider.id}>
                <Space>
                  <Icon type={provider.icon} />
                  {provider.name}
                  {provider.tags.map(tag => (
                    <Tag key={tag}>{tag}</Tag>
                  ))}
                </Space>
              </Option>
            ))}
          </Select>
        </Form.Item>

        {/* Dynamic Configuration Form */}
        {renderConfigForm()}

        {/* Actions */}
        <Space>
          <Button type="primary" onClick={handleSave}>
            Save Configuration
          </Button>
          <Button onClick={handleTest} loading={testing}>
            Test Connection
          </Button>
        </Space>
      </Space>
    </Card>
  );
};

4.2 API Client

File: web/src/utils/api.ts

export async function listSandboxProviders() {
  return request<{ data: Provider[] }>('/api/admin/sandbox/providers');
}

export async function getSandboxConfig() {
  return request<{ data: SandboxConfig }>('/api/admin/sandbox/config');
}

export async function setSandboxConfig(config: SandboxConfigRequest) {
  return request('/api/admin/sandbox/config', {
    method: 'POST',
    data: config,
  });
}

export async function testSandboxConnection(provider: string, config: any) {
  return request('/api/admin/sandbox/test', {
    method: 'POST',
    data: { provider, config },
  });
}

export async function setActiveSandboxProvider(provider: string) {
  return request('/api/admin/sandbox/active', {
    method: 'PUT',
    data: { provider },
  });
}

4.3 Type Definitions

File: web/src/types/sandbox.ts

interface Provider {
  id: string;
  name: string;
  description: string;
  icon: string;
  tags: string[];
  config_schema: Record<string, ConfigField>;
  supported_languages: string[];
}

interface ConfigField {
  type: 'string' | 'integer' | 'boolean';
  required: boolean;
  secret?: boolean;
  label: string;
  placeholder?: string;
  default?: any;
  options?: string[];
  min?: number;
  max?: number;
}

// Configuration response grouped by provider
interface SandboxConfig {
  active: string;  // Currently active provider
  self_managed?: Record<string, string>;
  aliyun_codeinterpreter?: Record<string, string>;
  e2b?: Record<string, string>;
  // Add more providers as needed
}

// Request to update provider configuration
interface SandboxConfigRequest {
  provider_type: string;
  config: Record<string, string | number | boolean>;
  test_connection?: boolean;
  set_active?: boolean;
}

5. Integration with Agent System

5.1 Agent Component Usage

The agent system will use the sandbox through the simplified provider manager, loading global configuration from SystemSettings:

# In agent/components/code_executor.py

import json
from agent.agent.sandbox.providers.manager import ProviderManager
from agent.agent.sandbox.providers.self_managed import SelfManagedProvider
from agent.agent.sandbox.providers.aliyun_codeinterpreter import AliyunCodeInterpreterProvider
from agent.agent.sandbox.providers.e2b import E2BProvider
from api.db.services.system_settings_service import SystemSettingsService

# Map provider IDs to their classes
PROVIDER_CLASSES = {
    "self_managed": SelfManagedProvider,
    "aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
    "e2b": E2BProvider,
}

class CodeExecutorComponent:
    def __init__(self):
        self.provider_manager = ProviderManager()
        self._load_active_provider()

    def _load_active_provider(self):
        """Load the active provider from system settings"""
        # Get active provider
        active_setting = SystemSettingsService.get_by_name("sandbox.provider_type")
        if not active_setting:
            raise RuntimeError("No sandbox provider configured")

        active_provider = active_setting[0].value

        # Load configuration for active provider (single JSON record)
        provider_setting = SystemSettingsService.get_by_name(f"sandbox.{active_provider}")
        if not provider_setting:
            raise RuntimeError(f"Sandbox provider {active_provider} not configured")

        # Parse JSON configuration
        try:
            config = json.loads(provider_setting[0].value)
        except json.JSONDecodeError as e:
            raise RuntimeError(f"Invalid sandbox configuration for {active_provider}: {e}")

        # Get provider class
        provider_class = PROVIDER_CLASSES.get(active_provider)
        if not provider_class:
            raise RuntimeError(f"Unknown provider: {active_provider}")

        # Initialize provider
        provider = provider_class()
        provider.initialize(config)

        # Set as active provider in manager
        self.provider_manager.set_provider(active_provider, provider)

    def execute(self, code: str, language: str) -> ExecutionResult:
        """Execute code using the active provider"""
        provider = self.provider_manager.get_provider()

        if not provider:
            raise RuntimeError("No sandbox provider configured")

        # Create instance
        instance = provider.create_instance(template=language)

        try:
            # Execute code
            result = provider.execute_code(
                instance_id=instance.instance_id,
                code=code,
                language=language
            )
            return result
        finally:
            # Always cleanup
            provider.destroy_instance(instance.instance_id)

6. Security Considerations

6.1 Credential Storage

  • Sensitive credentials (API keys, secrets) encrypted at rest in database
  • Use RAGFlow's existing encryption mechanisms (AES-256)
  • Never log or expose credentials in error messages or API responses
  • Credentials redacted in UI (show only last 4 characters)

6.2 Tenant Isolation

  • Configuration: Global sandbox settings shared by all tenants (admin-only access)
  • Execution: Sandboxes never shared across tenants/sessions during runtime
  • Instance IDs: Scoped to tenant: {tenant_id}:{session_id}:{instance_id}
  • Network Isolation: Between tenant sandboxes (VPC per tenant for SaaS providers)
  • Resource Quotas: Per-tenant limits on concurrent executions, total execution time
  • Audit Logging: All sandbox executions logged with tenant_id for traceability

6.3 Resource Limits

  • Timeout limits per execution (configurable per provider, default 30s)
  • Memory/CPU limits enforced at provider level
  • Automatic cleanup of stale instances (max lifetime: 5 minutes)
  • Rate limiting per tenant (max concurrent executions: 10)

6.4 Code Security

  • For self-managed: AST-based security analysis before execution
  • Blocked operations: file system writes, network calls, system commands
  • Allowlist approach: only specific imports allowed
  • Runtime monitoring for malicious patterns

6.5 Network Security

  • Self-managed: Network isolation by default, no external access
  • SaaS: HTTPS only, certificate pinning
  • IP whitelisting for self-managed endpoint access

7. Monitoring and Observability

7.1 Metrics to Track

Common Metrics (All Providers):

  • Execution success rate (target: >95%)
  • Average execution time (p50, p95, p99)
  • Error rate by error type
  • Active instance count
  • Queue depth (for self-managed pool)

Self-Managed Specific:

  • Container pool utilization (target: 60-80%)
  • Host resource usage (CPU, memory, disk)
  • Container creation latency
  • Container restart rate
  • gVisor runtime health

SaaS Specific:

  • API call latency by region
  • Rate limit usage and throttling events
  • Cost estimation (execution count × unit cost)
  • Provider availability (uptime %)
  • API error rate by error code

7.2 Logging

Structured logging for all provider operations:

{
  "timestamp": "2025-01-26T10:00:00Z",
  "tenant_id": "tenant_123",
  "provider": "aliyun_codeinterpreter",
  "operation": "execute_code",
  "instance_id": "inst_xyz",
  "language": "python",
  "code_hash": "sha256:...",
  "duration_ms": 1234,
  "status": "success",
  "exit_code": 0,
  "memory_used_mb": 64,
  "region": "cn-hangzhou"
}

7.3 Alerts

Critical Alerts:

  • Provider availability < 99%
  • Error rate > 5%
  • Average execution time > 10s
  • Container pool exhaustion (0 available)

Warning Alerts:

  • Cost spike (2x daily average)
  • Rate limit approaching (>80%)
  • High memory usage (>90%)
  • Slow execution times (p95 > 5s)

8. Migration Path

8.1 Phase 1: Refactor Existing Code (Week 1-2)

Goals: Extract current implementation into provider pattern

Tasks:

  • Create agent/sandbox/providers/base.py with SandboxProvider interface
  • Implement agent/sandbox/providers/self_managed.py wrapping executor_manager
  • Create agent/sandbox/providers/manager.py for provider management
  • Write unit tests for self-managed provider
  • Document existing behavior and configuration

Deliverables:

  • Provider abstraction layer
  • Self-managed provider implementation
  • Unit test suite

8.2 Phase 2: Database Integration (Week 3)

Goals: Add sandbox configuration to admin system

Tasks:

  • Add sandbox entries to conf/system_settings.json initialization file
  • Extend SettingsMgr in admin/server/services.py with sandbox-specific methods
  • Add admin endpoints to admin/server/routes.py
  • Implement configuration validation logic
  • Add provider connection testing
  • Write API tests

Deliverables:

  • SystemSettings integration
  • Admin API endpoints (/api/admin/sandbox/*)
  • Configuration validation
  • API test suite

8.3 Phase 3: Frontend UI (Week 4)

Goals: Build admin settings interface

Tasks:

  • Create web/src/pages/SandboxSettings/index.tsx
  • Implement dynamic form generation from provider schema
  • Add connection testing UI
  • Create TypeScript types
  • Write frontend tests

Deliverables:

  • Admin settings UI
  • Type definitions
  • Frontend test suite

8.4 Phase 4: SaaS Provider Implementation (Week 5-6)

Goals: Implement Aliyun Code Interpreter and E2B providers

Tasks:

  • Implement agent/sandbox/providers/aliyun_codeinterpreter.py
  • Implement agent/sandbox/providers/e2b.py
  • Add provider-specific tests with mocking
  • Document provider-specific behaviors
  • Create provider setup guides

Deliverables:

  • Aliyun Code Interpreter provider
  • E2B provider
  • Provider documentation

8.5 Phase 5: Agent Integration (Week 7)

Goals: Update agent components to use new provider system

Tasks:

  • Update agent/components/code_executor.py to use ProviderManager
  • Implement fallback logic
  • Add tenant-specific provider loading
  • Update agent tests
  • Performance testing

Deliverables:

  • Agent integration
  • Fallback mechanism
  • Updated test suite

8.6 Phase 6: Monitoring & Documentation (Week 8)

Goals: Add observability and complete documentation

Tasks:

  • Implement metrics collection
  • Add structured logging
  • Configure alerts
  • Write deployment guide
  • Write user documentation
  • Create troubleshooting guide

Deliverables:

  • Monitoring dashboards
  • Complete documentation
  • Deployment guides

9. Testing Strategy

9.1 Unit Tests

Provider Tests (test/agent/sandbox/providers/test_*.py):

class TestSelfManagedProvider:
    def test_initialize_with_config():
        provider = SelfManagedProvider()
        assert provider.initialize({"endpoint": "http://localhost:9385"})

    def test_create_python_instance():
        provider = SelfManagedProvider()
        provider.initialize(test_config)
        instance = provider.create_instance("python")
        assert instance.status == "running"

    def test_execute_code():
        provider = SelfManagedProvider()
        result = provider.execute_code(instance_id, "print('hello')", "python")
        assert result.exit_code == 0
        assert "hello" in result.stdout

Configuration Tests:

  • Test configuration validation for each provider schema
  • Test error handling for invalid configurations
  • Test secret field redaction

9.2 Integration Tests

Provider Switching:

  • Test switching between providers
  • Test fallback mechanism
  • Test concurrent provider usage

Multi-Tenant Isolation:

  • Test tenant configuration isolation
  • Test instance ID scoping
  • Test resource separation

Admin API Tests:

  • Test CRUD operations for configurations
  • Test connection testing endpoint
  • Test validation error responses

9.3 E2E Tests

Complete Flow Tests:

def test_sandbox_execution_flow():
    # 1. Configure provider via admin API
    setSandboxConfig(provider="self_managed", config={...})

    # 2. Create agent task with code execution
    task = create_agent_task(code="print('test')")

    # 3. Execute task
    result = execute_agent_task(task.id)

    # 4. Verify result
    assert result.status == "success"
    assert "test" in result.output

    # 5. Verify sandbox cleanup
    assert get_active_instances() == 0

Admin UI Tests:

  • Test provider configuration flow
  • Test connection testing
  • Test error handling in UI

9.4 Performance Tests

Load Testing:

  • Test 100 concurrent executions
  • Test pool exhaustion behavior
  • Test queue performance (self-managed)

Latency Testing:

  • Measure cold start time per provider
  • Measure execution latency percentiles
  • Compare provider performance

10. Cost Considerations

10.1 Self-Managed Costs

Infrastructure:

  • Server hosting: $X/month (depends on specs)
  • Maintenance: engineering time
  • Scaling: manual, requires additional servers

Pros:

  • Predictable costs
  • No per-execution fees
  • Full control over resources

Cons:

  • High initial setup cost
  • Operational overhead
  • Finite capacity

10.2 SaaS Costs

Aliyun Code Interpreter (estimated):

  • Pricing: execution time × memory configuration
  • Example: 1000 executions/day × 30s × $0.01/1000s = ~$0.30/day

E2B (estimated):

  • Pricing: $0.02/execution-second
  • Example: 1000 executions/day × 30s × $0.02/s = ~$600/day

Pros:

  • No upfront costs
  • Automatic scaling
  • No maintenance

Cons:

  • Variable costs (can spike with usage)
  • Network dependency
  • Potential for runaway costs

10.3 Cost Optimization

Recommendations:

  1. Hybrid Approach: Use self-managed for base load, SaaS for spikes
  2. Cost Monitoring: Set budget alerts per tenant
  3. Resource Limits: Enforce max executions per tenant/day
  4. Caching: Reuse instances when possible (self-managed pool)
  5. Smart Routing: Route to cheapest provider based on availability

11. Future Extensibility

The architecture supports easy addition of new providers:

11.1 Adding a New Provider

Step 1: Implement provider class with schema

# agent/sandbox/providers/new_provider.py
from .base import SandboxProvider

class NewProvider(SandboxProvider):
    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        return {
            "api_key": {
                "type": "string",
                "required": True,
                "secret": True,
                "label": "API Key"
            },
            "region": {
                "type": "string",
                "default": "us-east-1",
                "label": "Region"
            }
        }

    def initialize(self, config: Dict[str, Any]) -> bool:
        self.api_key = config.get("api_key")
        self.region = config.get("region", "us-east-1")
        # Initialize client
        return True

    # Implement other abstract methods...

Step 2: Register in provider mapping

# In api/apps/sandbox_app.py or wherever providers are listed
from agent.agent.sandbox.providers.new_provider import NewProvider

PROVIDER_CLASSES = {
    "self_managed": SelfManagedProvider,
    "aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
    "e2b": E2BProvider,
    "new_provider": NewProvider,  # Add here
}

No central registry to update - just import and add to the mapping!

11.2 Potential Future Providers

  • GitHub Codespaces: For GitHub-integrated workflows
  • Gitpod: Cloud development environments
  • CodeSandbox: Frontend code execution
  • AWS Firecracker: Raw microVM management
  • Custom Provider: User-defined provider implementations

11.3 Advanced Features

Feature Pooling:

  • Share instances across executions (same language, same user)
  • Warm pool for reduced latency
  • Instance hibernation for cost savings

Feature Multi-Region:

  • Route to nearest region
  • Failover across regions
  • Regional cost optimization

Feature Hybrid Execution:

  • Split workloads between providers
  • Dynamic provider selection based on cost/performance
  • A/B testing for provider performance

12. Appendix

12.1 Configuration Examples

SystemSettings Initialization File (conf/system_settings.json - add these entries):

{
  "system_settings": [
    {
      "name": "sandbox.provider_type",
      "source": "variable",
      "data_type": "string",
      "value": "self_managed"
    },
    {
      "name": "sandbox.self_managed",
      "source": "variable",
      "data_type": "json",
      "value": "{\"endpoint\": \"http://sandbox-internal:9385\", \"pool_size\": 20, \"max_memory\": \"512m\", \"timeout\": 60, \"enable_seccomp\": true, \"enable_ast_analysis\": true}"
    },
    {
      "name": "sandbox.aliyun_codeinterpreter",
      "source": "variable",
      "data_type": "json",
      "value": "{\"access_key_id\": \"\", \"access_key_secret\": \"\", \"account_id\": \"\", \"region\": \"cn-hangzhou\", \"template_name\": \"\", \"timeout\": 30}"
    },
    {
      "name": "sandbox.e2b",
      "source": "variable",
      "data_type": "json",
      "value": "{\"api_key\": \"\", \"region\": \"us\", \"timeout\": 30}"
    }
  ]
}

Admin API Request Example (POST to /api/admin/sandbox/config):

{
  "provider_type": "self_managed",
  "config": {
    "endpoint": "http://sandbox-internal:9385",
    "pool_size": 20,
    "max_memory": "512m",
    "timeout": 60,
    "enable_seccomp": true,
    "enable_ast_analysis": true
  },
  "test_connection": true,
  "set_active": true
}

Note: The config object in the request is a plain JSON object. The API will serialize it to a JSON string before storing in SystemSettings.

Admin API Response Example (GET from /api/admin/sandbox/config):

{
  "data": {
    "active": "self_managed",
    "self_managed": {
      "endpoint": "http://sandbox-internal:9385",
      "pool_size": 20,
      "max_memory": "512m",
      "timeout": 60,
      "enable_seccomp": true,
      "enable_ast_analysis": true
    },
    "aliyun_codeinterpreter": {
      "access_key_id": "",
      "access_key_secret": "",
      "region": "cn-hangzhou",
      "workspace_id": ""
    },
    "e2b": {
      "api_key": "",
      "region": "us",
      "timeout": 30
    }
  }
}

Note: The response deserializes the JSON strings back to objects for easier frontend consumption.

12.2 Error Codes

Code Description Resolution
SB001 Provider not initialized Configure provider in admin
SB002 Invalid configuration Check configuration values
SB003 Connection failed Check network and credentials
SB004 Instance creation failed Check provider capacity
SB005 Execution timeout Increase timeout or optimize code
SB006 Out of memory Reduce memory usage or increase limits
SB007 Code blocked by security policy Remove blocked imports/operations
SB008 Rate limit exceeded Reduce concurrency or upgrade plan
SB009 Provider unavailable Check provider status or use fallback

12.3 References


Document Version: 1.0 Last Updated: 2025-01-26 Author: RAGFlow Team Status: Design Specification - Ready for Review

Appendix C: Configuration Storage Considerations

Current Implementation

  • Storage: SystemSettings table with value field as TextField (unlimited length)
  • Migration: Database migration added to convert from CharField(1024) to TextField
  • Benefit: Supports arbitrarily long API keys, workspace IDs, and other SaaS provider credentials

Validation

  • Schema validation: Type checking, range validation, required field validation
  • Provider-specific validation: Custom validation via validate_config() method
  • Example: SelfManagedProvider validates URL format, timeout ranges, pool size constraints

Configuration Storage Format

Each provider's configuration is stored as JSON in SystemSettings.value:

  • sandbox.provider_type: Active provider selection
  • sandbox.self_managed: Self-managed provider JSON config
  • sandbox.aliyun_codeinterpreter: Aliyun provider JSON config
  • sandbox.e2b: E2B provider JSON config

Appendix D: Configuration Hot Reload Limitations

Current Behavior

Provider Configuration Requires Restart: When switching sandbox providers in the admin panel, the ragflow service must be restarted for changes to take effect.

Reason:

  • Admin and ragflow are separate processes
  • ragflow loads sandbox provider configuration only at startup
  • The get_provider_manager() function caches the provider globally
  • Configuration changes in MySQL are not automatically detected

Impact:

  • Switching from self_managedaliyun_codeinterpreter requires ragflow restart
  • Updating credentials/config requires ragflow restart
  • Not a dynamic configuration system

Workarounds:

  1. Production: Restart ragflow service after configuration changes:

    cd docker
    docker compose restart ragflow-server
    
  2. Development: Use the reload_provider() function in code:

    from agent.sandbox.client import reload_provider
    reload_provider()  # Reloads from MySQL settings
    

Future Enhancement: To support hot reload without restart, implement configuration change detection:

# In agent/sandbox/client.py
_config_timestamp: Optional[int] = None

def get_provider_manager() -> ProviderManager:
    global _provider_manager, _config_timestamp

    # Check if configuration has changed
    setting = SystemSettingsService.get_by_name("sandbox.provider_type")
    current_timestamp = setting[0].update_time if setting else 0

    if _config_timestamp is None or current_timestamp > _config_timestamp:
        # Configuration changed, reload provider
        _provider_manager = None
        _load_provider_from_settings()
        _config_timestamp = current_timestamp

    return _provider_manager

However, this adds overhead on every execute_code() call. For production use, explicit restart is preferred for simplicity and reliability.

Appendix E: Arguments Parameter Support

Overview

All sandbox providers support passing arguments to the main() function in user code. This enables dynamic parameter injection for code execution.

Implementation Details

Base Interface:

# agent/sandbox/providers/base.py
@abstractmethod
def execute_code(
    self,
    instance_id: str,
    code: str,
    language: str,
    timeout: int = 10,
    arguments: Optional[Dict[str, Any]] = None
) -> ExecutionResult:
    """
    Execute code in the sandbox.

    The code should contain a main() function that will be called with:
    - Python: main(**arguments) if arguments provided, else main()
    - JavaScript: main(arguments) if arguments provided, else main()
    """
    pass

Provider Implementations:

  1. Self-Managed Provider (self_managed.py:164):

    • Passes arguments via HTTP API: "arguments": arguments or {}
    • executor_manager receives and passes to code via command line
    • Runner script: args = json.loads(sys.argv[1]) then result = main(**args)
  2. Aliyun Code Interpreter (aliyun_codeinterpreter.py:260-275):

    • Wraps user code to call main(**arguments) or main() if no arguments
    • Python example:
      if arguments:
          wrapped_code = f'''{code}
      
      if __name__ == "__main__":
          import json
          result = main(**{json.dumps(arguments)})
          print(json.dumps(result) if isinstance(result, dict) else result)
      '''
      
    • JavaScript example:
      if arguments:
          wrapped_code = f'''{code}
      
      const result = main({json.dumps(arguments)});
      console.log(typeof result === 'object' ? JSON.stringify(result) : String(result));
      '''
      

Client Layer (client.py:138-190):

def execute_code(
    code: str,
    language: str = "python",
    timeout: int = 30,
    arguments: Optional[Dict[str, Any]] = None
) -> ExecutionResult:
    provider_manager = get_provider_manager()
    provider = provider_manager.get_provider()

    instance = provider.create_instance(template=language)
    try:
        result = provider.execute_code(
            instance_id=instance.instance_id,
            code=code,
            language=language,
            timeout=timeout,
            arguments=arguments  # Passed through to provider
        )
        return result
    finally:
        provider.destroy_instance(instance.instance_id)

CodeExec Tool Integration (code_exec.py:136-165):

def _execute_code(self, language: str, code: str, arguments: dict):
    # ... collect arguments from component configuration

    result = sandbox_execute_code(
        code=code,
        language=language,
        timeout=int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10 * 60)),
        arguments=arguments  # Passed through to sandbox client
    )

Usage Examples

Python Code with Arguments:

# User code
def main(name: str, count: int) -> dict:
    """Generate greeting"""
    return {"message": f"Hello {name}!" * count}

# Called with: arguments={"name": "World", "count": 3}
# Result: {"message": "Hello World!Hello World!Hello World!"}

JavaScript Code with Arguments:

// User code
function main(args) {
  const { name, count } = args;
  return `Hello ${name}!`.repeat(count);
}

// Called with: arguments={"name": "World", "count": 3}
// Result: "Hello World!Hello World!Hello World!"

Important Notes

  1. Function Signature: Code MUST define a main() function

    • Python: def main(**kwargs) or def main() if no arguments
    • JavaScript: function main(args) or function main() if no arguments
  2. Type Consistency: Arguments are passed as JSON, so types are preserved:

    • Numbers → int/float
    • Strings → str
    • Booleans → bool
    • Objects → dict (Python) / object (JavaScript)
    • Arrays → list (Python) / array (JavaScript)
  3. Return Value: Return value is serialized as JSON for parsing

    • Python: print(json.dumps(result)) if dict
    • JavaScript: console.log(JSON.stringify(result)) if object
  4. Provider Alignment: All providers (self_managed, aliyun_codeinterpreter, e2b) implement arguments passing consistently