Fix: Merge main branch (#10377)

### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Lynn <lynn_inf@hotmail.com>
Co-authored-by: chanx <1243304602@qq.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com>
Co-authored-by: huangzl <huangzl@shinemo.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Wilmer <33392318@qq.com>
Co-authored-by: Adrian Weidig <adrianweidig@gmx.net>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: Liu An <asiro@qq.com>
Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com>
Co-authored-by: BadwomanCraZY <511528396@qq.com>
Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com>
Co-authored-by: Russell Valentine <russ@coldstonelabs.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Billy Bao <newyorkupperbay@gmail.com>
Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com>
Co-authored-by: TensorNull <tensor.null@gmail.com>
Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com>
Co-authored-by: AB <aj@Ajays-MacBook-Air.local>
Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com>
Co-authored-by: He Wang <wanghechn@qq.com>
Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com>
Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box>
Co-authored-by: Stephen Hu <stephenhu@seismic.com>
Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com>
Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com>
Co-authored-by: mxc <mxc@example.com>
Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com>
Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com>
Co-authored-by: mcoder6425 <mcoder64@gmail.com>
Co-authored-by: TeslaZY <TeslaZY@outlook.com>
Co-authored-by: lemsn <lemsn@msn.com>
Co-authored-by: lemsn <lemsn@126.com>
Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com>
Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com>
Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>
This commit is contained in:
Kevin Hu
2025-09-30 13:13:15 +08:00
committed by GitHub
parent 4d6ff672eb
commit 20b577a72c
201 changed files with 7929 additions and 1110 deletions

View File

@ -0,0 +1,222 @@
# Installation Guide for Firecrawl RAGFlow Integration
This guide will help you install and configure the Firecrawl integration plugin for RAGFlow.
## Prerequisites
- RAGFlow instance running (version 0.20.5 or later)
- Python 3.8 or higher
- Firecrawl API key (get one at [firecrawl.dev](https://firecrawl.dev))
## Installation Methods
### Method 1: Manual Installation
1. **Download the plugin**:
```bash
git clone https://github.com/firecrawl/firecrawl.git
cd firecrawl/ragflow-firecrawl-integration
```
2. **Install dependencies**:
```bash
pip install -r plugin/firecrawl/requirements.txt
```
3. **Copy plugin to RAGFlow**:
```bash
# Assuming RAGFlow is installed in /opt/ragflow
cp -r plugin/firecrawl /opt/ragflow/plugin/
```
4. **Restart RAGFlow**:
```bash
# Restart RAGFlow services
docker compose -f /opt/ragflow/docker/docker-compose.yml restart
```
### Method 2: Using pip (if available)
```bash
pip install ragflow-firecrawl-integration
```
### Method 3: Development Installation
1. **Clone the repository**:
```bash
git clone https://github.com/firecrawl/firecrawl.git
cd firecrawl/ragflow-firecrawl-integration
```
2. **Install in development mode**:
```bash
pip install -e .
```
## Configuration
### 1. Get Firecrawl API Key
1. Visit [firecrawl.dev](https://firecrawl.dev)
2. Sign up for a free account
3. Navigate to your dashboard
4. Copy your API key (starts with `fc-`)
### 2. Configure in RAGFlow
1. **Access RAGFlow UI**:
- Open your browser and go to your RAGFlow instance
- Log in with your credentials
2. **Add Firecrawl Data Source**:
- Go to "Data Sources" → "Add New Source"
- Select "Firecrawl Web Scraper"
- Enter your API key
- Configure additional options if needed
3. **Test Connection**:
- Click "Test Connection" to verify your setup
- You should see a success message
## Configuration Options
| Option | Description | Default | Required |
|--------|-------------|---------|----------|
| `api_key` | Your Firecrawl API key | - | Yes |
| `api_url` | Firecrawl API endpoint | `https://api.firecrawl.dev` | No |
| `max_retries` | Maximum retry attempts | 3 | No |
| `timeout` | Request timeout (seconds) | 30 | No |
| `rate_limit_delay` | Delay between requests (seconds) | 1.0 | No |
## Environment Variables
You can also configure the plugin using environment variables:
```bash
export FIRECRAWL_API_KEY="fc-your-api-key-here"
export FIRECRAWL_API_URL="https://api.firecrawl.dev"
export FIRECRAWL_MAX_RETRIES="3"
export FIRECRAWL_TIMEOUT="30"
export FIRECRAWL_RATE_LIMIT_DELAY="1.0"
```
## Verification
### 1. Check Plugin Installation
```bash
# Check if the plugin directory exists
ls -la /opt/ragflow/plugin/firecrawl/
# Should show:
# __init__.py
# firecrawl_connector.py
# firecrawl_config.py
# firecrawl_processor.py
# firecrawl_ui.py
# ragflow_integration.py
# requirements.txt
```
### 2. Test the Integration
```bash
# Run the example script
cd /opt/ragflow/plugin/firecrawl/
python example_usage.py
```
### 3. Check RAGFlow Logs
```bash
# Check RAGFlow server logs
docker logs ragflow-server
# Look for messages like:
# "Firecrawl plugin loaded successfully"
# "Firecrawl data source registered"
```
## Troubleshooting
### Common Issues
1. **Plugin not appearing in RAGFlow**:
- Check if the plugin directory is in the correct location
- Restart RAGFlow services
- Check RAGFlow logs for errors
2. **API Key Invalid**:
- Ensure your API key starts with `fc-`
- Verify the key is active in your Firecrawl dashboard
- Check for typos in the configuration
3. **Connection Timeout**:
- Increase the timeout value in configuration
- Check your network connection
- Verify the API URL is correct
4. **Rate Limiting**:
- Increase the `rate_limit_delay` value
- Reduce the number of concurrent requests
- Check your Firecrawl usage limits
### Debug Mode
Enable debug logging to see detailed information:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
### Check Dependencies
```bash
# Verify all dependencies are installed
pip list | grep -E "(aiohttp|pydantic|requests)"
# Should show:
# aiohttp>=3.8.0
# pydantic>=2.0.0
# requests>=2.28.0
```
## Uninstallation
To remove the plugin:
1. **Remove plugin directory**:
```bash
rm -rf /opt/ragflow/plugin/firecrawl/
```
2. **Restart RAGFlow**:
```bash
docker compose -f /opt/ragflow/docker/docker-compose.yml restart
```
3. **Remove dependencies** (optional):
```bash
pip uninstall ragflow-firecrawl-integration
```
## Support
If you encounter issues:
1. Check the [troubleshooting section](#troubleshooting)
2. Review RAGFlow logs for error messages
3. Verify your Firecrawl API key and configuration
4. Check the [Firecrawl documentation](https://docs.firecrawl.dev)
5. Open an issue in the [Firecrawl repository](https://github.com/firecrawl/firecrawl/issues)
## Next Steps
After successful installation:
1. Read the [README.md](README.md) for usage examples
2. Try scraping a simple URL to test the integration
3. Explore the different scraping options (single URL, crawl, batch)
4. Configure your RAGFlow workflows to use the scraped content