Files
ragflow/intergrations/firecrawl/INSTALLATION.md
Zhichang Yu 73144e278b Don't release full image (#10654)
### What problem does this PR solve?

Introduced gpu profile in .env
Added Dockerfile_tei
fix datrie
Removed LIGHTEN flag

### Type of change

- [x] Documentation Update
- [x] Refactoring
2025-10-23 23:02:27 +08:00

5.1 KiB

Installation Guide for Firecrawl RAGFlow Integration

This guide will help you install and configure the Firecrawl integration plugin for RAGFlow.

Prerequisites

  • RAGFlow instance running (version 0.20.5 or later)
  • Python 3.8 or higher
  • Firecrawl API key (get one at firecrawl.dev)

Installation Methods

Method 1: Manual Installation

  1. Download the plugin:

    git clone https://github.com/firecrawl/firecrawl.git
    cd firecrawl/ragflow-firecrawl-integration
    
  2. Install dependencies:

    pip install -r plugin/firecrawl/requirements.txt
    
  3. Copy plugin to RAGFlow:

    # Assuming RAGFlow is installed in /opt/ragflow
    cp -r plugin/firecrawl /opt/ragflow/plugin/
    
  4. Restart RAGFlow:

    # Restart RAGFlow services
    docker compose -f /opt/ragflow/docker/docker-compose.yml restart
    

Method 2: Using pip (if available)

pip install ragflow-firecrawl-integration

Method 3: Development Installation

  1. Clone the repository:

    git clone https://github.com/firecrawl/firecrawl.git
    cd firecrawl/ragflow-firecrawl-integration
    
  2. Install in development mode:

    pip install -e .
    

Configuration

1. Get Firecrawl API Key

  1. Visit firecrawl.dev
  2. Sign up for a free account
  3. Navigate to your dashboard
  4. Copy your API key (starts with fc-)

2. Configure in RAGFlow

  1. Access RAGFlow UI:

    • Open your browser and go to your RAGFlow instance
    • Log in with your credentials
  2. Add Firecrawl Data Source:

    • Go to "Data Sources" → "Add New Source"
    • Select "Firecrawl Web Scraper"
    • Enter your API key
    • Configure additional options if needed
  3. Test Connection:

    • Click "Test Connection" to verify your setup
    • You should see a success message

Configuration Options

Option Description Default Required
api_key Your Firecrawl API key - Yes
api_url Firecrawl API endpoint https://api.firecrawl.dev No
max_retries Maximum retry attempts 3 No
timeout Request timeout (seconds) 30 No
rate_limit_delay Delay between requests (seconds) 1.0 No

Environment Variables

You can also configure the plugin using environment variables:

export FIRECRAWL_API_KEY="fc-your-api-key-here"
export FIRECRAWL_API_URL="https://api.firecrawl.dev"
export FIRECRAWL_MAX_RETRIES="3"
export FIRECRAWL_TIMEOUT="30"
export FIRECRAWL_RATE_LIMIT_DELAY="1.0"

Verification

1. Check Plugin Installation

# Check if the plugin directory exists
ls -la /opt/ragflow/plugin/firecrawl/

# Should show:
# __init__.py
# firecrawl_connector.py
# firecrawl_config.py
# firecrawl_processor.py
# firecrawl_ui.py
# ragflow_integration.py
# requirements.txt

2. Test the Integration

# Run the example script
cd /opt/ragflow/plugin/firecrawl/
python example_usage.py

3. Check RAGFlow Logs

# Check RAGFlow server logs
docker logs docker-ragflow-cpu-1

# Look for messages like:
# "Firecrawl plugin loaded successfully"
# "Firecrawl data source registered"

Troubleshooting

Common Issues

  1. Plugin not appearing in RAGFlow:

    • Check if the plugin directory is in the correct location
    • Restart RAGFlow services
    • Check RAGFlow logs for errors
  2. API Key Invalid:

    • Ensure your API key starts with fc-
    • Verify the key is active in your Firecrawl dashboard
    • Check for typos in the configuration
  3. Connection Timeout:

    • Increase the timeout value in configuration
    • Check your network connection
    • Verify the API URL is correct
  4. Rate Limiting:

    • Increase the rate_limit_delay value
    • Reduce the number of concurrent requests
    • Check your Firecrawl usage limits

Debug Mode

Enable debug logging to see detailed information:

import logging
logging.basicConfig(level=logging.DEBUG)

Check Dependencies

# Verify all dependencies are installed
pip list | grep -E "(aiohttp|pydantic|requests)"

# Should show:
# aiohttp>=3.8.0
# pydantic>=2.0.0
# requests>=2.28.0

Uninstallation

To remove the plugin:

  1. Remove plugin directory:

    rm -rf /opt/ragflow/plugin/firecrawl/
    
  2. Restart RAGFlow:

    docker compose -f /opt/ragflow/docker/docker-compose.yml restart
    
  3. Remove dependencies (optional):

    pip uninstall ragflow-firecrawl-integration
    

Support

If you encounter issues:

  1. Check the troubleshooting section
  2. Review RAGFlow logs for error messages
  3. Verify your Firecrawl API key and configuration
  4. Check the Firecrawl documentation
  5. Open an issue in the Firecrawl repository

Next Steps

After successful installation:

  1. Read the README.md for usage examples
  2. Try scraping a simple URL to test the integration
  3. Explore the different scraping options (single URL, crawl, batch)
  4. Configure your RAGFlow workflows to use the scraped content