Files
ragflow/agent/sandbox/client.py
Zhichang Yu fd11aca8e5 feat: Implement pluggable multi-provider sandbox architecture (#12820)
## Summary

Implement a flexible sandbox provider system supporting both
self-managed (Docker) and SaaS (Aliyun Code Interpreter) backends for
secure code execution in agent workflows.

**Key Changes:**
-  Aliyun Code Interpreter provider using official
`agentrun-sdk>=0.0.16`
-  Self-managed provider with gVisor (runsc) security
-  Arguments parameter support for dynamic code execution
-  Database-only configuration (removed fallback logic)
-  Configuration scripts for quick setup

Issue #12479

## Features

### 🔌 Provider Abstraction Layer

**1. Self-Managed Provider** (`agent/sandbox/providers/self_managed.py`)
- Wraps existing executor_manager HTTP API
- gVisor (runsc) for secure container isolation
- Configurable pool size, timeout, retry logic
- Languages: Python, Node.js, JavaScript
- ⚠️ **Requires**: gVisor installation, Docker, base images

**2. Aliyun Code Interpreter**
(`agent/sandbox/providers/aliyun_codeinterpreter.py`)
- SaaS integration using official agentrun-sdk
- Serverless microVM execution with auto-authentication
- Hard timeout: 30 seconds max
- Credentials: `AGENTRUN_ACCESS_KEY_ID`, `AGENTRUN_ACCESS_KEY_SECRET`,
`AGENTRUN_ACCOUNT_ID`, `AGENTRUN_REGION`
- Automatically wraps code to call `main()` function

**3. E2B Provider** (`agent/sandbox/providers/e2b.py`)
- Placeholder for future integration

### ⚙️ Configuration System

- `conf/system_settings.json`: Default provider =
`aliyun_codeinterpreter`
- `agent/sandbox/client.py`: Enforces database-only configuration
- Admin UI: `/admin/sandbox-settings`
- Configuration validation via `validate_config()` method
- Health checks for all providers

### 🎯 Key Capabilities

**Arguments Parameter Support:**
All providers support passing arguments to `main()` function:
```python
# User code
def main(name: str, count: int) -> dict:
    return {"message": f"Hello {name}!" * count}

# Executed with: arguments={"name": "World", "count": 3}
# Result: {"message": "Hello World!Hello World!Hello World!"}
```

**Self-Describing Providers:**
Each provider implements `get_config_schema()` returning form
configuration for Admin UI

**Error Handling:**
Structured `ExecutionResult` with stdout, stderr, exit_code,
execution_time

## Configuration Scripts

Two scripts for quick Aliyun sandbox setup:

**Shell Script (requires jq):**
```bash
source scripts/configure_aliyun_sandbox.sh
```

**Python Script (interactive):**
```bash
python3 scripts/configure_aliyun_sandbox.py
```

## Testing

```bash
# Unit tests
uv run pytest agent/sandbox/tests/test_providers.py -v

# Aliyun provider tests
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v

# Integration tests (requires credentials)
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v

# Quick SDK validation
python3 agent/sandbox/tests/verify_sdk.py
```

**Test Coverage:**
- 30 unit tests for provider abstraction
- Provider-specific tests for Aliyun
- Integration tests with real API
- Security tests for executor_manager

## Documentation

- `docs/develop/sandbox_spec.md` - Complete architecture specification
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration from legacy
sandbox
- `agent/sandbox/tests/QUICKSTART.md` - Quick start guide
- `agent/sandbox/tests/README.md` - Testing documentation

## Breaking Changes

⚠️ **Migration Required:**

1. **Directory Move**: `sandbox/` → `agent/sandbox/`
   - Update imports: `from sandbox.` → `from agent.sandbox.`

2. **Mandatory Configuration**: 
   - SystemSettings must have `sandbox.provider_type` configured
   - Removed fallback default values
- Configuration must exist in database (from
`conf/system_settings.json`)

3. **Aliyun Credentials**:
   - Requires `AGENTRUN_*` environment variables (not `ALIYUN_*`)
   - `AGENTRUN_ACCOUNT_ID` is now required (Aliyun primary account ID)

4. **Self-Managed Provider**:
   - gVisor (runsc) must be installed for security
   - Install: `go install gvisor.dev/gvisor/runsc@latest`

## Database Schema Changes

```python
# SystemSettings.value: CharField → TextField
api/db/db_models.py: Changed for unlimited config length

# SystemSettingsService.get_by_name(): Fixed query precision
api/db/services/system_settings_service.py: startswith → exact match
```

## Files Changed

### Backend (Python)
- `agent/sandbox/providers/base.py` - SandboxProvider ABC interface
- `agent/sandbox/providers/manager.py` - ProviderManager
- `agent/sandbox/providers/self_managed.py` - Self-managed provider
- `agent/sandbox/providers/aliyun_codeinterpreter.py` - Aliyun provider
- `agent/sandbox/providers/e2b.py` - E2B provider (placeholder)
- `agent/sandbox/client.py` - Unified client (enforces DB-only config)
- `agent/tools/code_exec.py` - Updated to use provider system
- `admin/server/services.py` - SandboxMgr with registry & validation
- `admin/server/routes.py` - 5 sandbox API endpoints
- `conf/system_settings.json` - Default: aliyun_codeinterpreter
- `api/db/db_models.py` - TextField for SystemSettings.value
- `api/db/services/system_settings_service.py` - Exact match query

### Frontend (TypeScript/React)
- `web/src/pages/admin/sandbox-settings.tsx` - Settings UI
- `web/src/services/admin-service.ts` - Sandbox service functions
- `web/src/services/admin.service.d.ts` - Type definitions
- `web/src/utils/api.ts` - Sandbox API endpoints

### Documentation
- `docs/develop/sandbox_spec.md` - Architecture spec
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration guide
- `agent/sandbox/tests/QUICKSTART.md` - Quick start
- `agent/sandbox/tests/README.md` - Testing guide

### Configuration Scripts
- `scripts/configure_aliyun_sandbox.sh` - Shell script (jq)
- `scripts/configure_aliyun_sandbox.py` - Python script

### Tests
- `agent/sandbox/tests/test_providers.py` - 30 unit tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter.py` - Provider tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py` -
Integration tests
- `agent/sandbox/tests/verify_sdk.py` - SDK validation

## Architecture

```
Admin UI → Admin API → SandboxMgr → ProviderManager → [SelfManaged|Aliyun|E2B]
                                      ↓
                                  SystemSettings
```

## Usage

### 1. Configure Provider

**Via Admin UI:**
1. Navigate to `/admin/sandbox-settings`
2. Select provider (Aliyun Code Interpreter / Self-Managed)
3. Fill in configuration
4. Click "Test Connection" to verify
5. Click "Save" to apply

**Via Configuration Scripts:**
```bash
# Aliyun provider
export AGENTRUN_ACCESS_KEY_ID="xxx"
export AGENTRUN_ACCESS_KEY_SECRET="yyy"
export AGENTRUN_ACCOUNT_ID="zzz"
export AGENTRUN_REGION="cn-shanghai"
source scripts/configure_aliyun_sandbox.sh
```

### 2. Restart Service

```bash
cd docker
docker compose restart ragflow-server
```

### 3. Execute Code in Agent

```python
from agent.sandbox.client import execute_code

result = execute_code(
    code='def main(name: str) -> dict: return {"message": f"Hello {name}!"}',
    language="python",
    timeout=30,
    arguments={"name": "World"}
)

print(result.stdout)  # {"message": "Hello World!"}
```

## Troubleshooting

### "Container pool is busy" (Self-Managed)
- **Cause**: Pool exhausted (default: 1 container in `.env`)
- **Fix**: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` to 5+

### "Sandbox provider type not configured"
- **Cause**: Database missing configuration
- **Fix**: Run config script or set via Admin UI

### "gVisor not found"
- **Cause**: runsc not installed
- **Fix**: `go install gvisor.dev/gvisor/runsc@latest && sudo cp
~/go/bin/runsc /usr/local/bin/`

### Aliyun authentication errors
- **Cause**: Wrong environment variable names
- **Fix**: Use `AGENTRUN_*` prefix (not `ALIYUN_*`)

## Checklist

- [x] All tests passing (30 unit tests + integration tests)
- [x] Documentation updated (spec, migration guide, quickstart)
- [x] Type definitions added (TypeScript)
- [x] Admin UI implemented
- [x] Configuration validation
- [x] Health checks implemented
- [x] Error handling with structured results
- [x] Breaking changes documented
- [x] Configuration scripts created
- [x] gVisor requirements documented

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 13:28:21 +08:00

240 lines
7.1 KiB
Python

#
# Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
"""
Sandbox client for agent components.
This module provides a unified interface for agent components to interact
with the configured sandbox provider.
"""
import json
import logging
from typing import Dict, Any, Optional
from api.db.services.system_settings_service import SystemSettingsService
from agent.sandbox.providers import ProviderManager
from agent.sandbox.providers.base import ExecutionResult
logger = logging.getLogger(__name__)
# Global provider manager instance
_provider_manager: Optional[ProviderManager] = None
def get_provider_manager() -> ProviderManager:
"""
Get the global provider manager instance.
Returns:
ProviderManager instance with active provider loaded
"""
global _provider_manager
if _provider_manager is not None:
return _provider_manager
# Initialize provider manager with system settings
_provider_manager = ProviderManager()
_load_provider_from_settings()
return _provider_manager
def _load_provider_from_settings() -> None:
"""
Load sandbox provider from system settings and configure the provider manager.
This function reads the system settings to determine which provider is active
and initializes it with the appropriate configuration.
"""
global _provider_manager
if _provider_manager is None:
return
try:
# Get active provider type
provider_type_settings = SystemSettingsService.get_by_name("sandbox.provider_type")
if not provider_type_settings:
raise RuntimeError(
"Sandbox provider type not configured. Please set 'sandbox.provider_type' in system settings."
)
provider_type = provider_type_settings[0].value
# Get provider configuration
provider_config_settings = SystemSettingsService.get_by_name(f"sandbox.{provider_type}")
if not provider_config_settings:
logger.warning(f"No configuration found for provider: {provider_type}")
config = {}
else:
try:
config = json.loads(provider_config_settings[0].value)
except json.JSONDecodeError as e:
logger.error(f"Failed to parse sandbox config for {provider_type}: {e}")
config = {}
# Import and instantiate the provider
from agent.sandbox.providers import (
SelfManagedProvider,
AliyunCodeInterpreterProvider,
E2BProvider,
)
provider_classes = {
"self_managed": SelfManagedProvider,
"aliyun_codeinterpreter": AliyunCodeInterpreterProvider,
"e2b": E2BProvider,
}
if provider_type not in provider_classes:
logger.error(f"Unknown provider type: {provider_type}")
return
provider_class = provider_classes[provider_type]
provider = provider_class()
# Initialize the provider
if not provider.initialize(config):
logger.error(f"Failed to initialize sandbox provider: {provider_type}. Config keys: {list(config.keys())}")
return
# Set the active provider
_provider_manager.set_provider(provider_type, provider)
logger.info(f"Sandbox provider '{provider_type}' initialized successfully")
except Exception as e:
logger.error(f"Failed to load sandbox provider from settings: {e}")
import traceback
traceback.print_exc()
def reload_provider() -> None:
"""
Reload the sandbox provider from system settings.
Use this function when sandbox settings have been updated.
"""
global _provider_manager
_provider_manager = None
_load_provider_from_settings()
def execute_code(
code: str,
language: str = "python",
timeout: int = 30,
arguments: Optional[Dict[str, Any]] = None
) -> ExecutionResult:
"""
Execute code in the configured sandbox.
This is the main entry point for agent components to execute code.
Args:
code: Source code to execute
language: Programming language (python, nodejs, javascript)
timeout: Maximum execution time in seconds
arguments: Optional arguments dict to pass to main() function
Returns:
ExecutionResult containing stdout, stderr, exit_code, and metadata
Raises:
RuntimeError: If no provider is configured or execution fails
"""
provider_manager = get_provider_manager()
if not provider_manager.is_configured():
raise RuntimeError(
"No sandbox provider configured. Please configure sandbox settings in the admin panel."
)
provider = provider_manager.get_provider()
# Create a sandbox instance
instance = provider.create_instance(template=language)
try:
# Execute the code
result = provider.execute_code(
instance_id=instance.instance_id,
code=code,
language=language,
timeout=timeout,
arguments=arguments
)
return result
finally:
# Clean up the instance
try:
provider.destroy_instance(instance.instance_id)
except Exception as e:
logger.warning(f"Failed to destroy sandbox instance {instance.instance_id}: {e}")
def health_check() -> bool:
"""
Check if the sandbox provider is healthy.
Returns:
True if provider is configured and healthy, False otherwise
"""
try:
provider_manager = get_provider_manager()
if not provider_manager.is_configured():
return False
provider = provider_manager.get_provider()
return provider.health_check()
except Exception as e:
logger.error(f"Sandbox health check failed: {e}")
return False
def get_provider_info() -> Dict[str, Any]:
"""
Get information about the current sandbox provider.
Returns:
Dictionary with provider information:
- provider_type: Type of the active provider
- configured: Whether provider is configured
- healthy: Whether provider is healthy
"""
try:
provider_manager = get_provider_manager()
return {
"provider_type": provider_manager.get_provider_name(),
"configured": provider_manager.is_configured(),
"healthy": health_check(),
}
except Exception as e:
logger.error(f"Failed to get provider info: {e}")
return {
"provider_type": None,
"configured": False,
"healthy": False,
}