## Description
This PR implements comprehensive OceanBase performance monitoring and
health check functionality as requested in issue #12772. The
implementation follows the existing ES/Infinity health check patterns
and provides detailed metrics for operations teams.
## Problem
Currently, RAGFlow lacks detailed health monitoring for OceanBase when
used as the document engine. Operations teams need visibility into:
- Connection status and latency
- Storage space usage
- Query throughput (QPS)
- Slow query statistics
- Connection pool utilization
## Solution
### 1. Enhanced OBConnection Class (`rag/utils/ob_conn.py`)
Added comprehensive performance monitoring methods:
- `get_performance_metrics()` - Main method returning all performance
metrics
- `_get_storage_info()` - Retrieves database storage usage
- `_get_connection_pool_stats()` - Gets connection pool statistics
- `_get_slow_query_count()` - Counts queries exceeding threshold
- `_estimate_qps()` - Estimates queries per second
- Enhanced `health()` method with connection status
### 2. Health Check Utilities (`api/utils/health_utils.py`)
Added two new functions following ES/Infinity patterns:
- `get_oceanbase_status()` - Returns OceanBase status with health and
performance metrics
- `check_oceanbase_health()` - Comprehensive health check with detailed
metrics
### 3. API Endpoint (`api/apps/system_app.py`)
Added new endpoint:
- `GET /v1/system/oceanbase/status` - Returns OceanBase health status
and performance metrics
### 4. Comprehensive Unit Tests
(`test/unit_test/utils/test_oceanbase_health.py`)
Added 340+ lines of unit tests covering:
- Health check success/failure scenarios
- Performance metrics retrieval
- Error handling and edge cases
- Connection pool statistics
- Storage information retrieval
- QPS estimation
- Slow query detection
## Metrics Provided
- **Connection Status**: connected/disconnected
- **Latency**: Query latency in milliseconds
- **Storage**: Used and total storage space
- **QPS**: Estimated queries per second
- **Slow Queries**: Count of queries exceeding threshold
- **Connection Pool**: Active connections, max connections, pool size
## Testing
- All unit tests pass
- Error handling tested for connection failures
- Edge cases covered (missing tables, connection errors)
- Follows existing code patterns and conventions
## Code Statistics
- **Total Lines Changed**: 665+ lines
- **New Code**: ~600 lines
- **Test Coverage**: 340+ lines of comprehensive tests
- **Files Modified**: 3
- **Files Created**: 1 (test file)
## Acceptance Criteria Met
✅ `/system/oceanbase/status` API returns OceanBase health status
✅ Monitoring metrics accurately reflect OceanBase running status
✅ Clear error messages when health checks fail
✅ Response time optimized (metrics cached where possible)
✅ Follows existing ES/Infinity health check patterns
✅ Comprehensive test coverage
## Related Files
- `rag/utils/ob_conn.py` - OceanBase connection class
- `api/utils/health_utils.py` - Health check utilities
- `api/apps/system_app.py` - System API endpoints
- `test/unit_test/utils/test_oceanbase_health.py` - Unit tests
Fixes#12772
---------
Co-authored-by: Daniel <daniel@example.com>
### What problem does this PR solve?
Adds a CLI-based retrieval test to CI after the Elasticsearch HTTP API
tests to validate end-to-end admin/user flows and dataset retrieval via
ragflow_cli.py. This helps catch regressions in the CLI path that aren’t
covered by existing API tests.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Added Redis port calculation and environment variable export to support
Redis service in test environment. The port is dynamically assigned
based on runner number to prevent conflicts during parallel test
execution. Removed by #12685
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Added project instructions for setting up and running the application.
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Align MySQL defaults between docker/.env and
docker/service_conf.yaml.template
close#12645
### Type of change
- [x] Other (please describe):Unify MySQL configuration
### What problem does this PR solve?
As title
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe): CI
Potential fix for
[https://github.com/infiniflow/ragflow/security/code-scanning/62](https://github.com/infiniflow/ragflow/security/code-scanning/62)
In general, the fix is to explicitly declare a `permissions:` block so
the GITHUB_TOKEN used by this workflow only has the scopes required:
read access to repository contents and write access to
contents/releases. Since this workflow creates or moves tags and
creates/overwrites releases via `softprops/action-gh-release`, it needs
`contents: write`. There is no evidence that it needs other elevated
scopes (issues, pull-requests, actions, etc.), so these should remain at
their default of `none` by omission.
The best minimal fix without changing existing functionality is to add a
workflow-level `permissions:` block near the top of
`.github/workflows/release.yml`, after `name:` and before `on:` (or
anywhere at the root level, but this is conventional). This will apply
to all jobs (there is only `jobs.release`) and ensure that the
GITHUB_TOKEN has only `contents: write`. No additional imports or
methods are needed because this is a YAML configuration change only.
Concretely:
- Edit `.github/workflows/release.yml`.
- Insert:
```yaml
permissions:
contents: write
```
between line 2 (empty line after `name: release`) and line 3 (`on:`). No
other lines need to be changed.
_Suggested fixes powered by Copilot Autofix. Review carefully before
merging._
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
### What problem does this PR solve?
as title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
As title
### Type of change
- [x] Other (please describe): Update GitHub action
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Output log to file when run infinity tests.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
As title
### Type of change
- [x] Other (please describe):
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Add '|| true' to docker rmi command to prevent workflow failure when
image removal fails. This ensures the CI pipeline continues even if the
Docker image cannot be removed for any reason.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Introduced gpu profile in .env
Added Dockerfile_tei
fix datrie
Removed LIGHTEN flag
### Type of change
- [x] Documentation Update
- [x] Refactoring
…release workflow (#10039)
This change updates the GitHub Actions workflow to push additional
stable tags alongside version tags, enabling automated update tools like
Watchtower to detect and pull the latest images correctly.
Refs:
[https://github.com/infiniflow/ragflow/issues/10039](https://github.com/infiniflow/ragflow/issues/10039)
### What problem does this PR solve?
Automated container update tools such as Watchtower rely on stable tags
like `latest` to identify the newest images. Previously, only
version-specific tags were pushed, which prevented these tools from
detecting new releases automatically. This PR adds multiple stable tags
(`latest-full`, `latest-slim`) alongside version tags to the Docker
image publishing workflow, ensuring smooth and reliable automated
updates without manual tag management.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
Fix the issue in ci.
[ci
err](https://github.com/infiniflow/ragflow/actions/runs/17452439789/job/49559702590?pr=9894)
```
Container ragflow-redis Error response from daemon: Conflict. The container name "/ragflow-redis" is already in use by container "b6cbde4d186ffba701f6e2a85f37e1d053d7197adb2938547f1df08cfcadf355". You have to remove (or rename) that container to be able to reuse that name.
Error response from daemon: Conflict. The container name "/ragflow-redis" is already in use by container "b6cbde4d186ffba701f6e2a85f37e1d053d7197adb2938547f1df08cfcadf355". You have to remove (or rename) that container to be able to reuse that name.
Error: Process completed with exit code 1.
```
### Type of change
- [x] Refactoring
- [x] Performance Improvement
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
### What problem does this PR solve?
- Replace manual venv activation with `uv run` for pytest commands
- Add dynamic test level (p2/p3) based on GitHub event type
- Simplify test commands by removing redundant directory changes
### Type of change
- [x] Update Action
### What problem does this PR solve?
Try the best to repair corrupted PDF files on upload automatically.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)