mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-28 22:26:36 +08:00

Files

Zhichang Yu fd11aca8e5 feat: Implement pluggable multi-provider sandbox architecture (#12820 )

## Summary

Implement a flexible sandbox provider system supporting both
self-managed (Docker) and SaaS (Aliyun Code Interpreter) backends for
secure code execution in agent workflows.

**Key Changes:**
- ✅ Aliyun Code Interpreter provider using official
`agentrun-sdk>=0.0.16`
- ✅ Self-managed provider with gVisor (runsc) security
- ✅ Arguments parameter support for dynamic code execution
- ✅ Database-only configuration (removed fallback logic)
- ✅ Configuration scripts for quick setup

Issue #12479

## Features

### 🔌 Provider Abstraction Layer

**1. Self-Managed Provider** (`agent/sandbox/providers/self_managed.py`)
- Wraps existing executor_manager HTTP API
- gVisor (runsc) for secure container isolation
- Configurable pool size, timeout, retry logic
- Languages: Python, Node.js, JavaScript
- ⚠️ **Requires**: gVisor installation, Docker, base images

**2. Aliyun Code Interpreter**
(`agent/sandbox/providers/aliyun_codeinterpreter.py`)
- SaaS integration using official agentrun-sdk
- Serverless microVM execution with auto-authentication
- Hard timeout: 30 seconds max
- Credentials: `AGENTRUN_ACCESS_KEY_ID`, `AGENTRUN_ACCESS_KEY_SECRET`,
`AGENTRUN_ACCOUNT_ID`, `AGENTRUN_REGION`
- Automatically wraps code to call `main()` function

**3. E2B Provider** (`agent/sandbox/providers/e2b.py`)
- Placeholder for future integration

### ⚙️ Configuration System

- `conf/system_settings.json`: Default provider =
`aliyun_codeinterpreter`
- `agent/sandbox/client.py`: Enforces database-only configuration
- Admin UI: `/admin/sandbox-settings`
- Configuration validation via `validate_config()` method
- Health checks for all providers

### 🎯 Key Capabilities

**Arguments Parameter Support:**
All providers support passing arguments to `main()` function:
```python
# User code
def main(name: str, count: int) -> dict:
    return {"message": f"Hello {name}!" * count}

# Executed with: arguments={"name": "World", "count": 3}
# Result: {"message": "Hello World!Hello World!Hello World!"}
```

**Self-Describing Providers:**
Each provider implements `get_config_schema()` returning form
configuration for Admin UI

**Error Handling:**
Structured `ExecutionResult` with stdout, stderr, exit_code,
execution_time

## Configuration Scripts

Two scripts for quick Aliyun sandbox setup:

**Shell Script (requires jq):**
```bash
source scripts/configure_aliyun_sandbox.sh
```

**Python Script (interactive):**
```bash
python3 scripts/configure_aliyun_sandbox.py
```

## Testing

```bash
# Unit tests
uv run pytest agent/sandbox/tests/test_providers.py -v

# Aliyun provider tests
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v

# Integration tests (requires credentials)
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v

# Quick SDK validation
python3 agent/sandbox/tests/verify_sdk.py
```

**Test Coverage:**
- 30 unit tests for provider abstraction
- Provider-specific tests for Aliyun
- Integration tests with real API
- Security tests for executor_manager

## Documentation

- `docs/develop/sandbox_spec.md` - Complete architecture specification
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration from legacy
sandbox
- `agent/sandbox/tests/QUICKSTART.md` - Quick start guide
- `agent/sandbox/tests/README.md` - Testing documentation

## Breaking Changes

⚠️ **Migration Required:**

1. **Directory Move**: `sandbox/` → `agent/sandbox/`
   - Update imports: `from sandbox.` → `from agent.sandbox.`

2. **Mandatory Configuration**: 
   - SystemSettings must have `sandbox.provider_type` configured
   - Removed fallback default values
- Configuration must exist in database (from
`conf/system_settings.json`)

3. **Aliyun Credentials**:
   - Requires `AGENTRUN_*` environment variables (not `ALIYUN_*`)
   - `AGENTRUN_ACCOUNT_ID` is now required (Aliyun primary account ID)

4. **Self-Managed Provider**:
   - gVisor (runsc) must be installed for security
   - Install: `go install gvisor.dev/gvisor/runsc@latest`

## Database Schema Changes

```python
# SystemSettings.value: CharField → TextField
api/db/db_models.py: Changed for unlimited config length

# SystemSettingsService.get_by_name(): Fixed query precision
api/db/services/system_settings_service.py: startswith → exact match
```

## Files Changed

### Backend (Python)
- `agent/sandbox/providers/base.py` - SandboxProvider ABC interface
- `agent/sandbox/providers/manager.py` - ProviderManager
- `agent/sandbox/providers/self_managed.py` - Self-managed provider
- `agent/sandbox/providers/aliyun_codeinterpreter.py` - Aliyun provider
- `agent/sandbox/providers/e2b.py` - E2B provider (placeholder)
- `agent/sandbox/client.py` - Unified client (enforces DB-only config)
- `agent/tools/code_exec.py` - Updated to use provider system
- `admin/server/services.py` - SandboxMgr with registry & validation
- `admin/server/routes.py` - 5 sandbox API endpoints
- `conf/system_settings.json` - Default: aliyun_codeinterpreter
- `api/db/db_models.py` - TextField for SystemSettings.value
- `api/db/services/system_settings_service.py` - Exact match query

### Frontend (TypeScript/React)
- `web/src/pages/admin/sandbox-settings.tsx` - Settings UI
- `web/src/services/admin-service.ts` - Sandbox service functions
- `web/src/services/admin.service.d.ts` - Type definitions
- `web/src/utils/api.ts` - Sandbox API endpoints

### Documentation
- `docs/develop/sandbox_spec.md` - Architecture spec
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration guide
- `agent/sandbox/tests/QUICKSTART.md` - Quick start
- `agent/sandbox/tests/README.md` - Testing guide

### Configuration Scripts
- `scripts/configure_aliyun_sandbox.sh` - Shell script (jq)
- `scripts/configure_aliyun_sandbox.py` - Python script

### Tests
- `agent/sandbox/tests/test_providers.py` - 30 unit tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter.py` - Provider tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py` -
Integration tests
- `agent/sandbox/tests/verify_sdk.py` - SDK validation

## Architecture

```
Admin UI → Admin API → SandboxMgr → ProviderManager → [SelfManaged|Aliyun|E2B]
                                      ↓
                                  SystemSettings
```

## Usage

### 1. Configure Provider

**Via Admin UI:**
1. Navigate to `/admin/sandbox-settings`
2. Select provider (Aliyun Code Interpreter / Self-Managed)
3. Fill in configuration
4. Click "Test Connection" to verify
5. Click "Save" to apply

**Via Configuration Scripts:**
```bash
# Aliyun provider
export AGENTRUN_ACCESS_KEY_ID="xxx"
export AGENTRUN_ACCESS_KEY_SECRET="yyy"
export AGENTRUN_ACCOUNT_ID="zzz"
export AGENTRUN_REGION="cn-shanghai"
source scripts/configure_aliyun_sandbox.sh
```

### 2. Restart Service

```bash
cd docker
docker compose restart ragflow-server
```

### 3. Execute Code in Agent

```python
from agent.sandbox.client import execute_code

result = execute_code(
    code='def main(name: str) -> dict: return {"message": f"Hello {name}!"}',
    language="python",
    timeout=30,
    arguments={"name": "World"}
)

print(result.stdout)  # {"message": "Hello World!"}
```

## Troubleshooting

### "Container pool is busy" (Self-Managed)
- **Cause**: Pool exhausted (default: 1 container in `.env`)
- **Fix**: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` to 5+

### "Sandbox provider type not configured"
- **Cause**: Database missing configuration
- **Fix**: Run config script or set via Admin UI

### "gVisor not found"
- **Cause**: runsc not installed
- **Fix**: `go install gvisor.dev/gvisor/runsc@latest && sudo cp
~/go/bin/runsc /usr/local/bin/`

### Aliyun authentication errors
- **Cause**: Wrong environment variable names
- **Fix**: Use `AGENTRUN_*` prefix (not `ALIYUN_*`)

## Checklist

- [x] All tests passing (30 unit tests + integration tests)
- [x] Documentation updated (spec, migration guide, quickstart)
- [x] Type definitions added (TypeScript)
- [x] Admin UI implemented
- [x] Configuration validation
- [x] Health checks implemented
- [x] Error handling with structured results
- [x] Breaking changes documented
- [x] Configuration scripts created
- [x] gVisor requirements documented

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-28 13:28:21 +08:00

56 KiB

Raw Blame History

RAGFlow Sandbox Multi-Provider Architecture - Design Specification

1. Overview

1.1 Goals

Enable RAGFlow to support multiple sandbox deployment modes:

Self-Managed: On-premise deployment using Daytona/Docker (current implementation)
SaaS Providers: Cloud-based sandbox services (Aliyun Code Interpreter, E2B)

1.2 Key Requirements

Provider-agnostic interface for sandbox operations
Admin-configurable provider settings with dynamic schema
Multi-tenant isolation (1:1 session-to-sandbox mapping)
Graceful fallback and error handling
Unified monitoring and observability

2. Architecture Design

2.1 Provider Abstraction Layer

Location: agent/sandbox/providers/

Define a unified SandboxProvider interface:

# agent/sandbox/providers/base.py
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional
from dataclasses import dataclass

@dataclass
class SandboxInstance:
    instance_id: str
    provider: str
    status: str  # running, stopped, error
    metadata: Dict[str, Any]

@dataclass
class ExecutionResult:
    stdout: str
    stderr: str
    exit_code: int
    execution_time: float
    metadata: Dict[str, Any]

class SandboxProvider(ABC):
    """Base interface for all sandbox providers"""

    @abstractmethod
    def initialize(self, config: Dict[str, Any]) -> bool:
        """Initialize provider with configuration"""
        pass

    @abstractmethod
    def create_instance(self, template: str = "python") -> SandboxInstance:
        """Create a new sandbox instance"""
        pass

    @abstractmethod
    def execute_code(
        self,
        instance_id: str,
        code: str,
        language: str,
        timeout: int = 10
    ) -> ExecutionResult:
        """Execute code in the sandbox"""
        pass

    @abstractmethod
    def destroy_instance(self, instance_id: str) -> bool:
        """Destroy a sandbox instance"""
        pass

    @abstractmethod
    def health_check(self) -> bool:
        """Check if provider is healthy"""
        pass

    @abstractmethod
    def get_supported_languages(self) -> list[str]:
        """Get list of supported programming languages"""
        pass

    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        """
        Return configuration schema for this provider.

        Returns a dictionary mapping field names to their schema definitions,
        including type, required status, validation rules, labels, and descriptions.
        """
        pass

    def validate_config(self, config: Dict[str, Any]) -> tuple[bool, Optional[str]]:
        """
        Validate provider-specific configuration.

        This method allows providers to implement custom validation logic beyond
        the basic schema validation. Override this method to add provider-specific
        checks like URL format validation, API key format validation, etc.

        Args:
            config: Configuration dictionary to validate

        Returns:
            Tuple of (is_valid, error_message):
                - is_valid: True if configuration is valid, False otherwise
                - error_message: Error message if invalid, None if valid
        """
        # Default implementation: no custom validation
        return True, None

2.2 Provider Implementations

2.2.1 Self-Managed Provider

File: agent/sandbox/providers/self_managed.py

Wraps the existing executor_manager implementation.

Prerequisites:

gVisor (runsc): Required for secure container isolation. Install with:
```
go install gvisor.dev/gvisor/runsc@latest
sudo cp ~/go/bin/runsc /usr/local/bin/
runsc --version
```
Or download from: https://github.com/google/gvisor/releases
Docker: Docker runtime with gVisor support

Base Images: Pull sandbox base images:

docker pull infiniflow/sandbox-base-python:latest
docker pull infiniflow/sandbox-base-nodejs:latest

Configuration: Docker API endpoint, pool size, resource limits

endpoint: HTTP endpoint (default: "http://localhost:9385")
timeout: Request timeout in seconds (default: 30)
max_retries: Maximum retry attempts (default: 3)
pool_size: Container pool size (default: 10)

Languages: Python, Node.js, JavaScript

Security: gVisor (runsc runtime), seccomp, read-only filesystem, memory limits

Advantages:

Low latency (<90ms), data privacy, full control
No per-execution costs
Supports arguments parameter for passing data to main() function

Limitations:

Operational overhead, finite resources
Requires gVisor installation for security
Pool exhaustion causes "Container pool is busy" errors

Common Issues:

"Container pool is busy": Increase SANDBOX_EXECUTOR_MANAGER_POOL_SIZE (default: 1 in .env, should be 5+)
Container creation fails: Ensure gVisor is installed and accessible at /usr/local/bin/runsc

2.2.2 Aliyun Code Interpreter Provider

File: agent/sandbox/providers/aliyun_codeinterpreter.py

SaaS integration with Aliyun Function Compute Code Interpreter service using the official agentrun-sdk.

Official Resources:

API Documentation: https://help.aliyun.com/zh/functioncompute/fc/sandbox-sandbox-code-interepreter
Official SDK: https://github.com/Serverless-Devs/agentrun-sdk-python
SDK Docs: https://docs.agent.run

Implementation:

Uses official agentrun-sdk package
SDK handles authentication (AccessKey signature) automatically
Supports environment variable configuration
Structured error handling with ServerError exceptions

Configuration:

access_key_id: Aliyun AccessKey ID
access_key_secret: Aliyun AccessKey Secret
account_id: Aliyun primary account ID (主账号ID) - Required for API calls
region: Region (cn-hangzhou, cn-beijing, cn-shanghai, cn-shenzhen, cn-guangzhou)
template_name: Optional sandbox template name for pre-configured environments
timeout: Execution timeout (max 30 seconds - hard limit)

Languages: Python, JavaScript

Security: Serverless microVM isolation, 30-second hard timeout limit

Advantages:

Official SDK with automatic signature handling
Unlimited scalability, no maintenance
China region support with low latency
Built-in file system management
Support for execution contexts (Jupyter kernel)
Context-based execution for state persistence

Limitations:

Network dependency
30-second execution time limit (hard limit)
Pay-as-you-go costs
Requires Aliyun primary account ID for API calls

Setup Instructions - Creating a RAM User with Minimal Privileges:

⚠️ Security Warning: Never use your Aliyun primary account (root account) AccessKey for SDK operations. Primary accounts have full resource permissions, and leaked credentials pose significant security risks.

Step 1: Create a RAM User

Log in to RAM Console
Navigate to People → Users
Click Create User
Configure the user:
- Username: e.g., ragflow-sandbox-user
- Display Name: e.g., RAGFlow Sandbox Service Account
- Access Mode: Check ✅ OpenAPI/Programmatic Access (this creates an AccessKey)
- Console Login: Optional (not needed for SDK-only access)
Click OK and save the AccessKey ID and Secret immediately (displayed only once!)

Step 2: Create a Custom Authorization Policy

Navigate to Permissions → Policies → Create Policy → Custom Policy → Configuration Script (JSON)

Choose one of the following policy options based on your security requirements:

Option A: Minimal Privilege Policy (Recommended)

Grants only the permissions required by the AgentRun SDK:

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "agentrun:CreateTemplate",
        "agentrun:GetTemplate",
        "agentrun:UpdateTemplate",
        "agentrun:DeleteTemplate",
        "agentrun:ListTemplates",
        "agentrun:CreateSandbox",
        "agentrun:GetSandbox",
        "agentrun:DeleteSandbox",
        "agentrun:StopSandbox",
        "agentrun:ListSandboxes",
        "agentrun:CreateContext",
        "agentrun:ExecuteCode",
        "agentrun:DeleteContext",
        "agentrun:ListContexts",
        "agentrun:CreateFile",
        "agentrun:GetFile",
        "agentrun:DeleteFile",
        "agentrun:ListFiles",
        "agentrun:CreateProcess",
        "agentrun:GetProcess",
        "agentrun:KillProcess",
        "agentrun:ListProcesses",
        "agentrun:CreateRecording",
        "agentrun:GetRecording",
        "agentrun:DeleteRecording",
        "agentrun:ListRecordings",
        "agentrun:CheckHealth"
      ],
      "Resource": [
        "acs:agentrun:*:{account_id}:template/*",
        "acs:agentrun:*:{account_id}:sandbox/*"
      ]
    }
  ]
}

Replace {account_id} with your Aliyun primary account ID

Option B: Resource-Level Privilege Control (Most Secure)

Limits access to specific resource prefixes:

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "agentrun:CreateTemplate",
        "agentrun:GetTemplate",
        "agentrun:UpdateTemplate",
        "agentrun:DeleteTemplate",
        "agentrun:ListTemplates"
      ],
      "Resource": "acs:agentrun:*:{account_id}:template/ragflow-*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "agentrun:CreateSandbox",
        "agentrun:GetSandbox",
        "agentrun:DeleteSandbox",
        "agentrun:StopSandbox",
        "agentrun:ListSandboxes",
        "agentrun:CheckHealth"
      ],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*"
    },
    {
      "Effect": "Allow",
      "Action": ["agentrun:*"],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*/context/*"
    },
    {
      "Effect": "Allow",
      "Action": ["agentrun:*"],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*/file/*"
    },
    {
      "Effect": "Allow",
      "Action": ["agentrun:*"],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*/process/*"
    },
    {
      "Effect": "Allow",
      "Action": ["agentrun:*"],
      "Resource": "acs:agentrun:*:{account_id}:sandbox/*/recording/*"
    }
  ]
}

This limits template creation to only those prefixed with ragflow-*

Option C: Full Access (Not Recommended for Production)

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "agentrun:*",
      "Resource": "*"
    }
  ]
}

Step 3: Authorize the RAM User

Return to Users list
Find the user you just created (e.g., ragflow-sandbox-user)
Click Add Permissions in the Actions column
In the Custom Policy tab, select the policy you created in Step 2
Click OK

Step 4: Configure RAGFlow with the RAM User Credentials

After creating the RAM user and obtaining the AccessKey, configure it in RAGFlow's admin settings or environment variables:

# Method 1: Environment variables (for development/testing)
export AGENTRUN_ACCESS_KEY_ID="LTAI5t..."  # RAM user's AccessKey ID
export AGENTRUN_ACCESS_KEY_SECRET="xxx..."  # RAM user's AccessKey Secret
export AGENTRUN_ACCOUNT_ID="123456789..."  # Your primary account ID
export AGENTRUN_REGION="cn-hangzhou"

Or via Admin UI (recommended for production):

Navigate to Admin Settings → Sandbox Providers
Select Aliyun Code Interpreter provider
Fill in the configuration:
- access_key_id: RAM user's AccessKey ID
- access_key_secret: RAM user's AccessKey Secret
- account_id: Your primary account ID
- region: e.g., cn-hangzhou

Step 5: Verify Permissions

Test if the RAM user permissions are correctly configured:

from agentrun.sandbox import Sandbox, TemplateInput, TemplateType

try:
    # Test template creation
    template = Sandbox.create_template(
        input=TemplateInput(
            template_name="ragflow-permission-test",
            template_type=TemplateType.CODE_INTERPRETER
        )
    )
    print("✅ RAM user permissions are correctly configured")
except Exception as e:
    print(f"❌ Permission test failed: {e}")
finally:
    # Cleanup test resources
    try:
        Sandbox.delete_template("ragflow-permission-test")
    except:
        pass

Security Best Practices:

✅ Always use RAM user AccessKeys, never primary account AccessKeys
✅ Follow the principle of least privilege - grant only necessary permissions
✅ Rotate AccessKeys regularly - recommend every 3-6 months
✅ Enable MFA - enable multi-factor authentication for RAM users
✅ Use secure storage - store credentials in environment variables or secret management services, never hardcode in code
✅ Restrict IP access - add IP whitelist policies for RAM users if needed
✅ Monitor access logs - regularly check RAM user access logs in CloudTrail

Reference Links:

2.2.3 E2B Provider

File: agent/sandbox/providers/e2b.py

SaaS integration with E2B Cloud.

Configuration: api_key, region (us/eu)
Languages: Python, JavaScript, Go, Bash, etc.
Security: Firecracker microVMs
Advantages: Global CDN, fast startup, multiple language support
Limitations: International network latency for China users

2.3 Provider Management

File: agent/sandbox/providers/manager.py

Since we only use one active provider at a time (configured globally), the provider management is simplified:

class ProviderManager:
    """Manages the currently active sandbox provider"""

    def __init__(self):
        self.current_provider: Optional[SandboxProvider] = None
        self.current_provider_name: Optional[str] = None

    def set_provider(self, name: str, provider: SandboxProvider):
        """Set the active provider"""
        self.current_provider = provider
        self.current_provider_name = name

    def get_provider(self) -> Optional[SandboxProvider]:
        """Get the active provider"""
        return self.current_provider

    def get_provider_name(self) -> Optional[str]:
        """Get the active provider name"""
        return self.current_provider_name

Rationale: With global configuration, there's only one active provider at a time. The provider manager simply holds a reference to the currently active provider, making it a thin wrapper rather than a complex multi-provider manager.

3. Admin Configuration

3.1 Database Schema

Use the existing SystemSettings table for global sandbox configuration:

# In api/db/db_models.py

class SystemSettings(DataBaseModel):
    name = CharField(max_length=128, primary_key=True)
    source = CharField(max_length=32, null=False, index=False)
    data_type = CharField(max_length=32, null=False, index=False)
    value = CharField(max_length=1024, null=False, index=False)

Rationale: Sandbox manager is a system-level service shared by all tenants:

No per-tenant configuration needed (unlike LLM providers where each tenant has their own API keys)
Global settings like system email, DOC_ENGINE, etc.
Managed by administrators only
Leverages existing SettingsMgr in admin interface

Storage Strategy: Each provider's configuration stored as a single JSON object:

sandbox.provider_type - Active provider selection ("self_managed", "aliyun_codeinterpreter", "e2b")
sandbox.self_managed - JSON config for self-managed provider
sandbox.aliyun_codeinterpreter - JSON config for Aliyun Code Interpreter provider
sandbox.e2b - JSON config for E2B provider

Note: The value field has a 1024 character limit, which should be sufficient for typical sandbox configurations. If larger configs are needed, consider using a TextField or a separate configuration table.

3.2 Configuration Schema

Each provider's configuration is stored as a single JSON object in the value field:

Self-Managed Provider

{
  "name": "sandbox.self_managed",
  "source": "variable",
  "data_type": "json",
  "value": "{\"endpoint\": \"http://localhost:9385\", \"pool_size\": 10, \"max_memory\": \"256m\", \"timeout\": 30}"
}

Aliyun Code Interpreter

{
  "name": "sandbox.aliyun_codeinterpreter",
  "source": "variable",
  "data_type": "json",
  "value": "{\"access_key_id\": \"LTAI5t...\", \"access_key_secret\": \"xxxxx\", \"account_id\": \"1234567890...\", \"region\": \"cn-hangzhou\", \"timeout\": 30}"
}

E2B

{
  "name": "sandbox.e2b",
  "source": "variable",
  "data_type": "json",
  "value": "{\"api_key\": \"e2b_sk_...\", \"region\": \"us\", \"timeout\": 30}"
}

Active Provider Selection

{
  "name": "sandbox.provider_type",
  "source": "variable",
  "data_type": "string",
  "value": "self_managed"
}

3.3 Provider Self-Describing Schema

Each provider class implements a static method to describe its configuration schema:

# agent/sandbox/providers/base.py

class SandboxProvider(ABC):
    """Base interface for all sandbox providers"""

    @abstractmethod
    def initialize(self, config: Dict[str, Any]) -> bool:
        """Initialize provider with configuration"""
        pass

    @abstractmethod
    def create_instance(self, template: str = "python") -> SandboxInstance:
        """Create a new sandbox instance"""
        pass

    @abstractmethod
    def execute_code(
        self,
        instance_id: str,
        code: str,
        language: str,
        timeout: int = 10
    ) -> ExecutionResult:
        """Execute code in the sandbox"""
        pass

    @abstractmethod
    def destroy_instance(self, instance_id: str) -> bool:
        """Destroy a sandbox instance"""
        pass

    @abstractmethod
    def health_check(self) -> bool:
        """Check if provider is healthy"""
        pass

    @abstractmethod
    def get_supported_languages(self) -> list[str]:
        """Get list of supported programming languages"""
        pass

    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        """Return configuration schema for this provider"""
        return {}

Example Implementation:

# agent/sandbox/providers/self_managed.py

class SelfManagedProvider(SandboxProvider):
    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        return {
            "endpoint": {
                "type": "string",
                "required": True,
                "label": "API Endpoint",
                "placeholder": "http://localhost:9385"
            },
            "pool_size": {
                "type": "integer",
                "default": 10,
                "label": "Container Pool Size",
                "min": 1,
                "max": 100
            },
            "max_memory": {
                "type": "string",
                "default": "256m",
                "label": "Max Memory per Container",
                "options": ["128m", "256m", "512m", "1g"]
            },
            "timeout": {
                "type": "integer",
                "default": 30,
                "label": "Execution Timeout (seconds)",
                "min": 5,
                "max": 300
            }
        }

# agent/sandbox/providers/aliyun_codeinterpreter.py

class AliyunCodeInterpreterProvider(SandboxProvider):
    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        return {
            "access_key_id": {
                "type": "string",
                "required": True,
                "secret": True,
                "label": "Access Key ID",
                "description": "Aliyun AccessKey ID for authentication"
            },
            "access_key_secret": {
                "type": "string",
                "required": True,
                "secret": True,
                "label": "Access Key Secret",
                "description": "Aliyun AccessKey Secret for authentication"
            },
            "account_id": {
                "type": "string",
                "required": True,
                "label": "Account ID",
                "description": "Aliyun primary account ID (主账号ID), required for API calls"
            },
            "region": {
                "type": "string",
                "default": "cn-hangzhou",
                "label": "Region",
                "options": ["cn-hangzhou", "cn-beijing", "cn-shanghai", "cn-shenzhen", "cn-guangzhou"],
                "description": "Aliyun region for Code Interpreter service"
            },
            "template_name": {
                "type": "string",
                "required": False,
                "label": "Template Name",
                "description": "Optional sandbox template name for pre-configured environments"
            },
            "timeout": {
                "type": "integer",
                "default": 30,
                "label": "Execution Timeout (seconds)",
                "min": 1,
                "max": 30,
                "description": "Code execution timeout (max 30 seconds - hard limit)"
            }
        }

# agent/sandbox/providers/e2b.py

class E2BProvider(SandboxProvider):
    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        return {
            "api_key": {
                "type": "string",
                "required": True,
                "secret": True,
                "label": "API Key"
            },
            "region": {
                "type": "string",
                "default": "us",
                "label": "Region",
                "options": ["us", "eu"]
            },
            "timeout": {
                "type": "integer",
                "default": 30,
                "label": "Execution Timeout (seconds)",
                "min": 5,
                "max": 300
            }
        }

Benefits of Self-Describing Providers:

Single source of truth - schema defined alongside implementation
Easy to add new providers - no central registry to update
Type safety - schema stays in sync with provider code
Flexible - frontend can use schema for validation or hardcode if preferred

3.4 Admin API Endpoints

Follow existing pattern in admin/server/routes.py and use SettingsMgr:

# admin/server/routes.py (add new endpoints)

from flask import request, jsonify
import json
from api.db.services.system_settings_service import SystemSettingsService
from agent.agent.sandbox.providers.self_managed import SelfManagedProvider
from agent.agent.sandbox.providers.aliyun_codeinterpreter import AliyunCodeInterpreterProvider
from agent.agent.sandbox.providers.e2b import E2BProvider
from admin.server.services import SettingsMgr

# Map provider IDs to their classes
PROVIDER_CLASSES = {
    "self_managed": SelfManagedProvider,
    "aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
    "e2b": E2BProvider,
}

@admin_bp.route('/api/admin/sandbox/providers', methods=['GET'])
def list_sandbox_providers():
    """List available sandbox providers with their schemas"""
    providers = []
    for provider_id, provider_class in PROVIDER_CLASSES.items():
        schema = provider_class.get_config_schema()
        providers.append({
            "id": provider_id,
            "name": provider_id.replace("_", " ").title(),
            "config_schema": schema
        })
    return jsonify({"data": providers})

@admin_bp.route('/api/admin/sandbox/config', methods=['GET'])
def get_sandbox_config():
    """Get current sandbox configuration"""
    # Get active provider
    active_provider_setting = SystemSettingsService.get_by_name("sandbox.provider_type")
    active_provider = active_provider_setting[0].value if active_provider_setting else None

    config = {"active": active_provider}

    # Load all provider configs
    for provider_id in PROVIDER_CLASSES.keys():
        setting = SystemSettingsService.get_by_name(f"sandbox.{provider_id}")
        if setting:
            try:
                config[provider_id] = json.loads(setting[0].value)
            except json.JSONDecodeError:
                config[provider_id] = {}
        else:
            # Return default values from schema
            provider_class = PROVIDER_CLASSES[provider_id]
            schema = provider_class.get_config_schema()
            config[provider_id] = {
                key: field_def.get("default", "")
                for key, field_def in schema.items()
            }

    return jsonify({"data": config})

@admin_bp.route('/api/admin/sandbox/config', methods=['POST'])
def set_sandbox_config():
    """
    Update sandbox provider configuration.

    Request Parameters:
    - provider_type: Provider identifier (e.g., "self_managed", "e2b")
    - config: Provider configuration dictionary
    - set_active: (optional) If True, also set this provider as active.
                  Default: True for backward compatibility.
                  Set to False to update config without switching providers.
    - test_connection: (optional) If True, test connection before saving

    Response: Success message
    """
    req = request.json
    provider_type = req.get('provider_type')
    config = req.get('config')
    set_active = req.get('set_active', True)  # Default to True

    # Validate provider exists
    if provider_type not in PROVIDER_CLASSES:
        return jsonify({"error": "Unknown provider"}), 400

    # Validate configuration against schema
    provider_class = PROVIDER_CLASSES[provider_type]
    schema = provider_class.get_config_schema()
    validation_result = validate_config(config, schema)
    if not validation_result.valid:
        return jsonify({"error": "Invalid config", "details": validation_result.errors}), 400

    # Test connection if requested
    if req.get('test_connection'):
        test_result = test_provider_connection(provider_type, config)
        if not test_result.success:
            return jsonify({"error": "Connection failed", "details": test_result.error}), 400

    # Store entire config as a single JSON record
    config_json = json.dumps(config)
    setting_name = f"sandbox.{provider_type}"

    existing = SystemSettingsService.get_by_name(setting_name)
    if existing:
        SettingsMgr.update_by_name(setting_name, config_json)
    else:
        SystemSettingsService.save(
            name=setting_name,
            source="variable",
            data_type="json",
            value=config_json
        )

    # Set as active provider if requested (default: True)
    if set_active:
        SettingsMgr.update_by_name("sandbox.provider_type", provider_type)

    return jsonify({"message": "Configuration saved"})

@admin_bp.route('/api/admin/sandbox/test', methods=['POST'])
def test_sandbox_connection():
    """Test connection to sandbox provider"""
    provider_type = request.json.get('provider_type')
    config = request.json.get('config')

    test_result = test_provider_connection(provider_type, config)
    return jsonify({
        "success": test_result.success,
        "message": test_result.message,
        "latency_ms": test_result.latency_ms
    })

@admin_bp.route('/api/admin/sandbox/active', methods=['PUT'])
def set_active_sandbox_provider():
    """Set active sandbox provider"""
    provider_name = request.json.get('provider')

    if provider_name not in PROVIDER_CLASSES:
        return jsonify({"error": "Unknown provider"}), 400

    # Check if provider is configured
    provider_setting = SystemSettingsService.get_by_name(f"sandbox.{provider_name}")
    if not provider_setting:
        return jsonify({"error": "Provider not configured"}), 400

    SettingsMgr.update_by_name("sandbox.provider_type", provider_name)
    return jsonify({"message": "Active provider updated"})

4. Frontend Integration

4.1 Admin Settings UI

Location: web/src/pages/SandboxSettings/index.tsx

import { Form, Select, Input, Button, Card, Space, Tag, message } from 'antd';
import { listSandboxProviders, getSandboxConfig, setSandboxConfig, testSandboxConnection } from '@/utils/api';

const SandboxSettings: React.FC = () => {
  const [providers, setProviders] = useState<Provider[]>([]);
  const [configs, setConfigs] = useState<Config[]>([]);
  const [selectedProvider, setSelectedProvider] = useState<string>('');
  const [testing, setTesting] = useState(false);

  const providerSchema = providers.find(p => p.id === selectedProvider);

  const renderConfigForm = () => {
    if (!providerSchema) return null;

    return (
      <Form layout="vertical">
        {Object.entries(providerSchema.config_schema).map(([key, schema]) => (
          <Form.Item
            key={key}
            name={key}
            label={schema.label}
            rules={[{ required: schema.required }]}
          >
            {schema.secret ? (
              <Input.Password placeholder={schema.placeholder} />
            ) : schema.type === 'integer' ? (
              <InputNumber min={schema.min} max={schema.max} />
            ) : schema.options ? (
              <Select>
                {schema.options.map((opt: string) => (
                  <Option key={opt} value={opt}>{opt}</Option>
                ))}
              </Select>
            ) : (
              <Input placeholder={schema.placeholder} />
            )}
          </Form.Item>
        ))}
      </Form>
    );
  };

  return (
    <Card title="Sandbox Provider Configuration">
      <Space direction="vertical" style={{ width: '100%' }}>
        {/* Provider Selection */}
        <Form.Item label="Select Provider">
          <Select
            style={{ width: '100%' }}
            onChange={setSelectedProvider}
            value={selectedProvider}
          >
            {providers.map(provider => (
              <Option key={provider.id} value={provider.id}>
                <Space>
                  <Icon type={provider.icon} />
                  {provider.name}
                  {provider.tags.map(tag => (
                    <Tag key={tag}>{tag}</Tag>
                  ))}
                </Space>
              </Option>
            ))}
          </Select>
        </Form.Item>

        {/* Dynamic Configuration Form */}
        {renderConfigForm()}

        {/* Actions */}
        <Space>
          <Button type="primary" onClick={handleSave}>
            Save Configuration
          </Button>
          <Button onClick={handleTest} loading={testing}>
            Test Connection
          </Button>
        </Space>
      </Space>
    </Card>
  );
};

4.2 API Client

File: web/src/utils/api.ts

export async function listSandboxProviders() {
  return request<{ data: Provider[] }>('/api/admin/sandbox/providers');
}

export async function getSandboxConfig() {
  return request<{ data: SandboxConfig }>('/api/admin/sandbox/config');
}

export async function setSandboxConfig(config: SandboxConfigRequest) {
  return request('/api/admin/sandbox/config', {
    method: 'POST',
    data: config,
  });
}

export async function testSandboxConnection(provider: string, config: any) {
  return request('/api/admin/sandbox/test', {
    method: 'POST',
    data: { provider, config },
  });
}

export async function setActiveSandboxProvider(provider: string) {
  return request('/api/admin/sandbox/active', {
    method: 'PUT',
    data: { provider },
  });
}

4.3 Type Definitions

File: web/src/types/sandbox.ts

interface Provider {
  id: string;
  name: string;
  description: string;
  icon: string;
  tags: string[];
  config_schema: Record<string, ConfigField>;
  supported_languages: string[];
}

interface ConfigField {
  type: 'string' | 'integer' | 'boolean';
  required: boolean;
  secret?: boolean;
  label: string;
  placeholder?: string;
  default?: any;
  options?: string[];
  min?: number;
  max?: number;
}

// Configuration response grouped by provider
interface SandboxConfig {
  active: string;  // Currently active provider
  self_managed?: Record<string, string>;
  aliyun_codeinterpreter?: Record<string, string>;
  e2b?: Record<string, string>;
  // Add more providers as needed
}

// Request to update provider configuration
interface SandboxConfigRequest {
  provider_type: string;
  config: Record<string, string | number | boolean>;
  test_connection?: boolean;
  set_active?: boolean;
}

5. Integration with Agent System

5.1 Agent Component Usage

The agent system will use the sandbox through the simplified provider manager, loading global configuration from SystemSettings:

# In agent/components/code_executor.py

import json
from agent.agent.sandbox.providers.manager import ProviderManager
from agent.agent.sandbox.providers.self_managed import SelfManagedProvider
from agent.agent.sandbox.providers.aliyun_codeinterpreter import AliyunCodeInterpreterProvider
from agent.agent.sandbox.providers.e2b import E2BProvider
from api.db.services.system_settings_service import SystemSettingsService

# Map provider IDs to their classes
PROVIDER_CLASSES = {
    "self_managed": SelfManagedProvider,
    "aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
    "e2b": E2BProvider,
}

class CodeExecutorComponent:
    def __init__(self):
        self.provider_manager = ProviderManager()
        self._load_active_provider()

    def _load_active_provider(self):
        """Load the active provider from system settings"""
        # Get active provider
        active_setting = SystemSettingsService.get_by_name("sandbox.provider_type")
        if not active_setting:
            raise RuntimeError("No sandbox provider configured")

        active_provider = active_setting[0].value

        # Load configuration for active provider (single JSON record)
        provider_setting = SystemSettingsService.get_by_name(f"sandbox.{active_provider}")
        if not provider_setting:
            raise RuntimeError(f"Sandbox provider {active_provider} not configured")

        # Parse JSON configuration
        try:
            config = json.loads(provider_setting[0].value)
        except json.JSONDecodeError as e:
            raise RuntimeError(f"Invalid sandbox configuration for {active_provider}: {e}")

        # Get provider class
        provider_class = PROVIDER_CLASSES.get(active_provider)
        if not provider_class:
            raise RuntimeError(f"Unknown provider: {active_provider}")

        # Initialize provider
        provider = provider_class()
        provider.initialize(config)

        # Set as active provider in manager
        self.provider_manager.set_provider(active_provider, provider)

    def execute(self, code: str, language: str) -> ExecutionResult:
        """Execute code using the active provider"""
        provider = self.provider_manager.get_provider()

        if not provider:
            raise RuntimeError("No sandbox provider configured")

        # Create instance
        instance = provider.create_instance(template=language)

        try:
            # Execute code
            result = provider.execute_code(
                instance_id=instance.instance_id,
                code=code,
                language=language
            )
            return result
        finally:
            # Always cleanup
            provider.destroy_instance(instance.instance_id)

6. Security Considerations

6.1 Credential Storage

Sensitive credentials (API keys, secrets) encrypted at rest in database
Use RAGFlow's existing encryption mechanisms (AES-256)
Never log or expose credentials in error messages or API responses
Credentials redacted in UI (show only last 4 characters)

6.2 Tenant Isolation

Configuration: Global sandbox settings shared by all tenants (admin-only access)
Execution: Sandboxes never shared across tenants/sessions during runtime
Instance IDs: Scoped to tenant: {tenant_id}:{session_id}:{instance_id}
Network Isolation: Between tenant sandboxes (VPC per tenant for SaaS providers)
Resource Quotas: Per-tenant limits on concurrent executions, total execution time
Audit Logging: All sandbox executions logged with tenant_id for traceability

6.3 Resource Limits

Timeout limits per execution (configurable per provider, default 30s)
Memory/CPU limits enforced at provider level
Automatic cleanup of stale instances (max lifetime: 5 minutes)
Rate limiting per tenant (max concurrent executions: 10)

6.4 Code Security

For self-managed: AST-based security analysis before execution
Blocked operations: file system writes, network calls, system commands
Allowlist approach: only specific imports allowed
Runtime monitoring for malicious patterns

6.5 Network Security

Self-managed: Network isolation by default, no external access
SaaS: HTTPS only, certificate pinning
IP whitelisting for self-managed endpoint access

7. Monitoring and Observability

7.1 Metrics to Track

Common Metrics (All Providers):

Execution success rate (target: >95%)
Average execution time (p50, p95, p99)
Error rate by error type
Active instance count
Queue depth (for self-managed pool)

Self-Managed Specific:

Container pool utilization (target: 60-80%)
Host resource usage (CPU, memory, disk)
Container creation latency
Container restart rate
gVisor runtime health

SaaS Specific:

API call latency by region
Rate limit usage and throttling events
Cost estimation (execution count × unit cost)
Provider availability (uptime %)
API error rate by error code

7.2 Logging

Structured logging for all provider operations:

{
  "timestamp": "2025-01-26T10:00:00Z",
  "tenant_id": "tenant_123",
  "provider": "aliyun_codeinterpreter",
  "operation": "execute_code",
  "instance_id": "inst_xyz",
  "language": "python",
  "code_hash": "sha256:...",
  "duration_ms": 1234,
  "status": "success",
  "exit_code": 0,
  "memory_used_mb": 64,
  "region": "cn-hangzhou"
}

7.3 Alerts

Critical Alerts:

Provider availability < 99%
Error rate > 5%
Average execution time > 10s
Container pool exhaustion (0 available)

Warning Alerts:

Cost spike (2x daily average)
Rate limit approaching (>80%)
High memory usage (>90%)
Slow execution times (p95 > 5s)

8. Migration Path

8.1 Phase 1: Refactor Existing Code (Week 1-2)

Goals: Extract current implementation into provider pattern

Tasks:

Create agent/sandbox/providers/base.py with SandboxProvider interface
Implement agent/sandbox/providers/self_managed.py wrapping executor_manager
Create agent/sandbox/providers/manager.py for provider management
Write unit tests for self-managed provider
Document existing behavior and configuration

Deliverables:

Provider abstraction layer
Self-managed provider implementation
Unit test suite

8.2 Phase 2: Database Integration (Week 3)

Goals: Add sandbox configuration to admin system

Tasks:

Add sandbox entries to conf/system_settings.json initialization file
Extend SettingsMgr in admin/server/services.py with sandbox-specific methods
Add admin endpoints to admin/server/routes.py
Implement configuration validation logic
Add provider connection testing
Write API tests

Deliverables:

SystemSettings integration
Admin API endpoints (/api/admin/sandbox/*)
Configuration validation
API test suite

8.3 Phase 3: Frontend UI (Week 4)

Goals: Build admin settings interface

Tasks:

Create web/src/pages/SandboxSettings/index.tsx
Implement dynamic form generation from provider schema
Add connection testing UI
Create TypeScript types
Write frontend tests

Deliverables:

Admin settings UI
Type definitions
Frontend test suite

8.4 Phase 4: SaaS Provider Implementation (Week 5-6)

Goals: Implement Aliyun Code Interpreter and E2B providers

Tasks:

Implement agent/sandbox/providers/aliyun_codeinterpreter.py
Implement agent/sandbox/providers/e2b.py
Add provider-specific tests with mocking
Document provider-specific behaviors
Create provider setup guides

Deliverables:

Aliyun Code Interpreter provider
E2B provider
Provider documentation

8.5 Phase 5: Agent Integration (Week 7)

Goals: Update agent components to use new provider system

Tasks:

Update agent/components/code_executor.py to use ProviderManager
Implement fallback logic
Add tenant-specific provider loading
Update agent tests
Performance testing

Deliverables:

Agent integration
Fallback mechanism
Updated test suite

8.6 Phase 6: Monitoring & Documentation (Week 8)

Goals: Add observability and complete documentation

Tasks:

Implement metrics collection
Add structured logging
Configure alerts
Write deployment guide
Write user documentation
Create troubleshooting guide

Deliverables:

Monitoring dashboards
Complete documentation
Deployment guides

9. Testing Strategy

9.1 Unit Tests

Provider Tests (test/agent/sandbox/providers/test_*.py):

class TestSelfManagedProvider:
    def test_initialize_with_config():
        provider = SelfManagedProvider()
        assert provider.initialize({"endpoint": "http://localhost:9385"})

    def test_create_python_instance():
        provider = SelfManagedProvider()
        provider.initialize(test_config)
        instance = provider.create_instance("python")
        assert instance.status == "running"

    def test_execute_code():
        provider = SelfManagedProvider()
        result = provider.execute_code(instance_id, "print('hello')", "python")
        assert result.exit_code == 0
        assert "hello" in result.stdout

Configuration Tests:

Test configuration validation for each provider schema
Test error handling for invalid configurations
Test secret field redaction

9.2 Integration Tests

Provider Switching:

Test switching between providers
Test fallback mechanism
Test concurrent provider usage

Multi-Tenant Isolation:

Test tenant configuration isolation
Test instance ID scoping
Test resource separation

Admin API Tests:

Test CRUD operations for configurations
Test connection testing endpoint
Test validation error responses

9.3 E2E Tests

Complete Flow Tests:

def test_sandbox_execution_flow():
    # 1. Configure provider via admin API
    setSandboxConfig(provider="self_managed", config={...})

    # 2. Create agent task with code execution
    task = create_agent_task(code="print('test')")

    # 3. Execute task
    result = execute_agent_task(task.id)

    # 4. Verify result
    assert result.status == "success"
    assert "test" in result.output

    # 5. Verify sandbox cleanup
    assert get_active_instances() == 0

Admin UI Tests:

Test provider configuration flow
Test connection testing
Test error handling in UI

9.4 Performance Tests

Load Testing:

Test 100 concurrent executions
Test pool exhaustion behavior
Test queue performance (self-managed)

Latency Testing:

Measure cold start time per provider
Measure execution latency percentiles
Compare provider performance

10. Cost Considerations

10.1 Self-Managed Costs

Infrastructure:

Server hosting: $X/month (depends on specs)
Maintenance: engineering time
Scaling: manual, requires additional servers

Pros:

Predictable costs
No per-execution fees
Full control over resources

Cons:

High initial setup cost
Operational overhead
Finite capacity

10.2 SaaS Costs

Aliyun Code Interpreter (estimated):

Pricing: execution time × memory configuration
Example: 1000 executions/day × 30s × $0.01/1000s = ~$0.30/day

E2B (estimated):

Pricing: $0.02/execution-second
Example: 1000 executions/day × 30s × $0.02/s = ~$600/day

Pros:

No upfront costs
Automatic scaling
No maintenance

Cons:

Variable costs (can spike with usage)
Network dependency
Potential for runaway costs

10.3 Cost Optimization

Recommendations:

Hybrid Approach: Use self-managed for base load, SaaS for spikes
Cost Monitoring: Set budget alerts per tenant
Resource Limits: Enforce max executions per tenant/day
Caching: Reuse instances when possible (self-managed pool)
Smart Routing: Route to cheapest provider based on availability

11. Future Extensibility

The architecture supports easy addition of new providers:

11.1 Adding a New Provider

Step 1: Implement provider class with schema

# agent/sandbox/providers/new_provider.py
from .base import SandboxProvider

class NewProvider(SandboxProvider):
    @staticmethod
    def get_config_schema() -> Dict[str, Dict]:
        return {
            "api_key": {
                "type": "string",
                "required": True,
                "secret": True,
                "label": "API Key"
            },
            "region": {
                "type": "string",
                "default": "us-east-1",
                "label": "Region"
            }
        }

    def initialize(self, config: Dict[str, Any]) -> bool:
        self.api_key = config.get("api_key")
        self.region = config.get("region", "us-east-1")
        # Initialize client
        return True

    # Implement other abstract methods...

Step 2: Register in provider mapping

# In api/apps/sandbox_app.py or wherever providers are listed
from agent.agent.sandbox.providers.new_provider import NewProvider

PROVIDER_CLASSES = {
    "self_managed": SelfManagedProvider,
    "aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
    "e2b": E2BProvider,
    "new_provider": NewProvider,  # Add here
}

No central registry to update - just import and add to the mapping!

11.2 Potential Future Providers

GitHub Codespaces: For GitHub-integrated workflows
Gitpod: Cloud development environments
CodeSandbox: Frontend code execution
AWS Firecracker: Raw microVM management
Custom Provider: User-defined provider implementations

11.3 Advanced Features

Feature Pooling:

Share instances across executions (same language, same user)
Warm pool for reduced latency
Instance hibernation for cost savings

Feature Multi-Region:

Route to nearest region
Failover across regions
Regional cost optimization

Feature Hybrid Execution:

Split workloads between providers
Dynamic provider selection based on cost/performance
A/B testing for provider performance

12. Appendix

12.1 Configuration Examples

SystemSettings Initialization File (conf/system_settings.json - add these entries):

{
  "system_settings": [
    {
      "name": "sandbox.provider_type",
      "source": "variable",
      "data_type": "string",
      "value": "self_managed"
    },
    {
      "name": "sandbox.self_managed",
      "source": "variable",
      "data_type": "json",
      "value": "{\"endpoint\": \"http://sandbox-internal:9385\", \"pool_size\": 20, \"max_memory\": \"512m\", \"timeout\": 60, \"enable_seccomp\": true, \"enable_ast_analysis\": true}"
    },
    {
      "name": "sandbox.aliyun_codeinterpreter",
      "source": "variable",
      "data_type": "json",
      "value": "{\"access_key_id\": \"\", \"access_key_secret\": \"\", \"account_id\": \"\", \"region\": \"cn-hangzhou\", \"template_name\": \"\", \"timeout\": 30}"
    },
    {
      "name": "sandbox.e2b",
      "source": "variable",
      "data_type": "json",
      "value": "{\"api_key\": \"\", \"region\": \"us\", \"timeout\": 30}"
    }
  ]
}

Admin API Request Example (POST to /api/admin/sandbox/config):

{
  "provider_type": "self_managed",
  "config": {
    "endpoint": "http://sandbox-internal:9385",
    "pool_size": 20,
    "max_memory": "512m",
    "timeout": 60,
    "enable_seccomp": true,
    "enable_ast_analysis": true
  },
  "test_connection": true,
  "set_active": true
}

Note: The config object in the request is a plain JSON object. The API will serialize it to a JSON string before storing in SystemSettings.

Admin API Response Example (GET from /api/admin/sandbox/config):

{
  "data": {
    "active": "self_managed",
    "self_managed": {
      "endpoint": "http://sandbox-internal:9385",
      "pool_size": 20,
      "max_memory": "512m",
      "timeout": 60,
      "enable_seccomp": true,
      "enable_ast_analysis": true
    },
    "aliyun_codeinterpreter": {
      "access_key_id": "",
      "access_key_secret": "",
      "region": "cn-hangzhou",
      "workspace_id": ""
    },
    "e2b": {
      "api_key": "",
      "region": "us",
      "timeout": 30
    }
  }
}

Note: The response deserializes the JSON strings back to objects for easier frontend consumption.

12.2 Error Codes

Code	Description	Resolution
SB001	Provider not initialized	Configure provider in admin
SB002	Invalid configuration	Check configuration values
SB003	Connection failed	Check network and credentials
SB004	Instance creation failed	Check provider capacity
SB005	Execution timeout	Increase timeout or optimize code
SB006	Out of memory	Reduce memory usage or increase limits
SB007	Code blocked by security policy	Remove blocked imports/operations
SB008	Rate limit exceeded	Reduce concurrency or upgrade plan
SB009	Provider unavailable	Check provider status or use fallback

12.3 References

Document Version: 1.0 Last Updated: 2025-01-26 Author: RAGFlow Team Status: Design Specification - Ready for Review

Appendix C: Configuration Storage Considerations

Current Implementation

Storage: SystemSettings table with value field as TextField (unlimited length)
Migration: Database migration added to convert from CharField(1024) to TextField
Benefit: Supports arbitrarily long API keys, workspace IDs, and other SaaS provider credentials

Validation

Schema validation: Type checking, range validation, required field validation
Provider-specific validation: Custom validation via validate_config() method
Example: SelfManagedProvider validates URL format, timeout ranges, pool size constraints

Configuration Storage Format

Each provider's configuration is stored as JSON in SystemSettings.value:

sandbox.provider_type: Active provider selection
sandbox.self_managed: Self-managed provider JSON config
sandbox.aliyun_codeinterpreter: Aliyun provider JSON config
sandbox.e2b: E2B provider JSON config

Appendix D: Configuration Hot Reload Limitations

Current Behavior

Provider Configuration Requires Restart: When switching sandbox providers in the admin panel, the ragflow service must be restarted for changes to take effect.

Reason:

Admin and ragflow are separate processes
ragflow loads sandbox provider configuration only at startup
The get_provider_manager() function caches the provider globally
Configuration changes in MySQL are not automatically detected

Impact:

Switching from self_managed → aliyun_codeinterpreter requires ragflow restart
Updating credentials/config requires ragflow restart
Not a dynamic configuration system

Workarounds:

Production: Restart ragflow service after configuration changes:
```
cd docker
docker compose restart ragflow-server
```

Development: Use the reload_provider() function in code:

from agent.sandbox.client import reload_provider
reload_provider()  # Reloads from MySQL settings

Future Enhancement: To support hot reload without restart, implement configuration change detection:

# In agent/sandbox/client.py
_config_timestamp: Optional[int] = None

def get_provider_manager() -> ProviderManager:
    global _provider_manager, _config_timestamp

    # Check if configuration has changed
    setting = SystemSettingsService.get_by_name("sandbox.provider_type")
    current_timestamp = setting[0].update_time if setting else 0

    if _config_timestamp is None or current_timestamp > _config_timestamp:
        # Configuration changed, reload provider
        _provider_manager = None
        _load_provider_from_settings()
        _config_timestamp = current_timestamp

    return _provider_manager

However, this adds overhead on every execute_code() call. For production use, explicit restart is preferred for simplicity and reliability.

Appendix E: Arguments Parameter Support

Overview

All sandbox providers support passing arguments to the main() function in user code. This enables dynamic parameter injection for code execution.

Implementation Details

Base Interface:

# agent/sandbox/providers/base.py
@abstractmethod
def execute_code(
    self,
    instance_id: str,
    code: str,
    language: str,
    timeout: int = 10,
    arguments: Optional[Dict[str, Any]] = None
) -> ExecutionResult:
    """
    Execute code in the sandbox.

    The code should contain a main() function that will be called with:
    - Python: main(**arguments) if arguments provided, else main()
    - JavaScript: main(arguments) if arguments provided, else main()
    """
    pass

Provider Implementations:

Self-Managed Provider (self_managed.py:164):
- Passes arguments via HTTP API: "arguments": arguments or {}
- executor_manager receives and passes to code via command line
- Runner script: args = json.loads(sys.argv[1]) then result = main(**args)

Aliyun Code Interpreter (aliyun_codeinterpreter.py:260-275):

Wraps user code to call main(**arguments) or main() if no arguments

Python example:

if arguments:
    wrapped_code = f'''{code}

if __name__ == "__main__":
    import json
    result = main(**{json.dumps(arguments)})
    print(json.dumps(result) if isinstance(result, dict) else result)
'''

JavaScript example:

if arguments:
    wrapped_code = f'''{code}

const result = main({json.dumps(arguments)});
console.log(typeof result === 'object' ? JSON.stringify(result) : String(result));
'''

Client Layer (client.py:138-190):

def execute_code(
    code: str,
    language: str = "python",
    timeout: int = 30,
    arguments: Optional[Dict[str, Any]] = None
) -> ExecutionResult:
    provider_manager = get_provider_manager()
    provider = provider_manager.get_provider()

    instance = provider.create_instance(template=language)
    try:
        result = provider.execute_code(
            instance_id=instance.instance_id,
            code=code,
            language=language,
            timeout=timeout,
            arguments=arguments  # Passed through to provider
        )
        return result
    finally:
        provider.destroy_instance(instance.instance_id)

CodeExec Tool Integration (code_exec.py:136-165):

def _execute_code(self, language: str, code: str, arguments: dict):
    # ... collect arguments from component configuration

    result = sandbox_execute_code(
        code=code,
        language=language,
        timeout=int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10 * 60)),
        arguments=arguments  # Passed through to sandbox client
    )

Usage Examples

Python Code with Arguments:

# User code
def main(name: str, count: int) -> dict:
    """Generate greeting"""
    return {"message": f"Hello {name}!" * count}

# Called with: arguments={"name": "World", "count": 3}
# Result: {"message": "Hello World!Hello World!Hello World!"}

JavaScript Code with Arguments:

// User code
function main(args) {
  const { name, count } = args;
  return `Hello ${name}!`.repeat(count);
}

// Called with: arguments={"name": "World", "count": 3}
// Result: "Hello World!Hello World!Hello World!"

Important Notes

Function Signature: Code MUST define a main() function
- Python: def main(**kwargs) or def main() if no arguments
- JavaScript: function main(args) or function main() if no arguments
Type Consistency: Arguments are passed as JSON, so types are preserved:
- Numbers → int/float
- Strings → str
- Booleans → bool
- Objects → dict (Python) / object (JavaScript)
- Arrays → list (Python) / array (JavaScript)
Return Value: Return value is serialized as JSON for parsing
- Python: print(json.dumps(result)) if dict
- JavaScript: console.log(JSON.stringify(result)) if object
Provider Alignment: All providers (self_managed, aliyun_codeinterpreter, e2b) implement arguments passing consistently

56 KiB Raw Blame History Unescape Escape

RAGFlow Sandbox Multi-Provider Architecture - Design Specification

1. Overview

1.1 Goals

1.2 Key Requirements

2. Architecture Design

2.1 Provider Abstraction Layer

2.2 Provider Implementations

2.2.1 Self-Managed Provider

2.2.2 Aliyun Code Interpreter Provider

2.2.3 E2B Provider

2.3 Provider Management

3. Admin Configuration

3.1 Database Schema

3.2 Configuration Schema

Self-Managed Provider

Aliyun Code Interpreter

E2B

Active Provider Selection

3.3 Provider Self-Describing Schema

3.4 Admin API Endpoints

4. Frontend Integration

4.1 Admin Settings UI

4.2 API Client

4.3 Type Definitions

5. Integration with Agent System

5.1 Agent Component Usage

6. Security Considerations

6.1 Credential Storage

6.2 Tenant Isolation

6.3 Resource Limits

6.4 Code Security

6.5 Network Security

7. Monitoring and Observability

7.1 Metrics to Track

7.2 Logging

7.3 Alerts

8. Migration Path

8.1 Phase 1: Refactor Existing Code (Week 1-2)

8.2 Phase 2: Database Integration (Week 3)

8.3 Phase 3: Frontend UI (Week 4)

8.4 Phase 4: SaaS Provider Implementation (Week 5-6)

8.5 Phase 5: Agent Integration (Week 7)

8.6 Phase 6: Monitoring & Documentation (Week 8)

9. Testing Strategy

9.1 Unit Tests

9.2 Integration Tests

9.3 E2E Tests

9.4 Performance Tests

10. Cost Considerations

10.1 Self-Managed Costs

10.2 SaaS Costs

10.3 Cost Optimization

11. Future Extensibility

11.1 Adding a New Provider

11.2 Potential Future Providers

11.3 Advanced Features

12. Appendix

12.1 Configuration Examples

12.2 Error Codes

12.3 References

Appendix C: Configuration Storage Considerations

Current Implementation

Validation

Configuration Storage Format

Appendix D: Configuration Hot Reload Limitations

Current Behavior

Appendix E: Arguments Parameter Support

Overview

Implementation Details

Usage Examples

Important Notes

56 KiB

Raw Blame History