|
|
ed6a76dcc0
|
Add Firecrawl integration for RAGFlow (#10152)
## 🚀 Firecrawl Integration for RAGFlow
This PR implements the Firecrawl integration for RAGFlow as requested in
issue https://github.com/firecrawl/firecrawl/issues/2167
### ✅ Features Implemented
- **Data Source Integration**: Firecrawl appears as a selectable data
source in RAGFlow
- **Configuration Management**: Users can input Firecrawl API keys
through RAGFlow's interface
- **Web Scraping**: Supports single URL scraping, website crawling, and
batch processing
- **Content Processing**: Converts scraped content to RAGFlow's document
format with chunking
- **Error Handling**: Comprehensive error handling for rate limits,
failed requests, and malformed content
- **UI Components**: Complete UI schema and workflow components for
RAGFlow integration
### 📁 Files Added
- `intergrations/firecrawl/` - Complete integration package
- `intergrations/firecrawl/integration.py` - RAGFlow integration entry
point
- `intergrations/firecrawl/firecrawl_connector.py` - API communication
- `intergrations/firecrawl/firecrawl_config.py` - Configuration
management
- `intergrations/firecrawl/firecrawl_processor.py` - Content processing
- `intergrations/firecrawl/firecrawl_ui.py` - UI components
- `intergrations/firecrawl/ragflow_integration.py` - Main integration
class
- `intergrations/firecrawl/README.md` - Complete documentation
- `intergrations/firecrawl/example_usage.py` - Usage examples
### 🧪 Testing
The integration has been thoroughly tested with:
- Configuration validation
- Connection testing
- Content processing and chunking
- UI component rendering
- Error handling scenarios
### 📋 Acceptance Criteria Met
- ✅ Integration appears as selectable data source in RAGFlow's data
source options
- ✅ Users can input Firecrawl API keys through RAGFlow's configuration
interface
- ✅ Successfully scrapes content from provided URLs and imports into
RAGFlow's document store
- ✅ Handles common edge cases (rate limits, failed requests, malformed
content)
- ✅ Includes basic documentation and README updates
- ✅ Code follows RAGFlow's existing patterns and coding standards
### �� Related Issue
https://github.com/firecrawl/firecrawl/issues/2167
---------
Co-authored-by: AB <aj@Ajays-MacBook-Air.local>
|
2025-09-19 09:58:17 +08:00 |
|