### What problem does this PR solve?
Align MySQL defaults between docker/.env and
docker/service_conf.yaml.template
close#12645
### Type of change
- [x] Other (please describe):Unify MySQL configuration
### What problem does this PR solve?
When uploading multiple files at once, if any of the files are of an
unsupported type and the blob is not removed, it triggers a
TypeError('Object of type bytes is not JSON serializable') exception.
This prevents the frontend from responding properly.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
This PR fixes a `KeyError` crash when running RAPTOR tasks on documents
that don't have the expected vector field.
## Related Issue
Fixes https://github.com/infiniflow/ragflow/issues/12675
## Problem
When running RAPTOR tasks, the code assumes all chunks have the vector
field `q_<size>_vec` (e.g., `q_1024_vec`). However, chunks may not have
this field if:
1. They were indexed with a **different embedding model** (different
vector size)
2. The embedding step **failed silently** during initial parsing
3. The document was parsed before the current embedding model was
configured
This caused a crash:
```
KeyError: 'q_1024_vec'
```
## Solution
Added defensive validation in `run_raptor_for_kb()`:
1. **Check for vector field existence** before accessing it
2. **Skip chunks** that don't have the required vector field instead of
crashing
3. **Log warnings** for skipped chunks with actionable guidance
4. **Provide informative error messages** suggesting users re-parse
documents with the current embedding model
5. **Handle both scopes** (`file` and `kb` modes)
## Changes
- `rag/svr/task_executor.py`: Added validation and error handling in
`run_raptor_for_kb()`
## Testing
1. Create a knowledge base with an embedding model
2. Parse documents
3. Change the embedding model to one with a different vector size
4. Run RAPTOR task
5. **Before**: Crashes with `KeyError`
6. **After**: Gracefully skips incompatible chunks with informative
warnings
---
<!-- Gittensor Contribution Tag: @GlobalStar117 -->
Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>
### What problem does this PR solve?
Fix: The time zone is unable to update properly in the database #12696
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1) Create dataset using table parser for infinity
2) Answer questions in chat using SQL
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
Fixes#12651
The Docker container was failing at startup with:
```
/ragflow/.venv/bin/python3: No module named pip
```
This occurred when `USE_DOCLING=true` because the `entrypoint.sh` tries
to use `uv pip install` to install docling at runtime.
## Root Cause
As explained in the issue:
1. `uv sync` creates a minimal, production-focused environment **without
pip**
2. The production stage copies the venv from builder
3. Runtime commands using `uv pip install` fail because pip is not
present
## Solution
Added `python -m ensurepip --upgrade` after `uv sync` in the Dockerfile
to ensure pip is available in the virtual environment:
```dockerfile
uv sync --python 3.12 --frozen && \
# Ensure pip is available in the venv for runtime package installation (fixes#12651)
.venv/bin/python3 -m ensurepip --upgrade
```
This is a minimal change that:
- Ensures pip is installed during build time
- Doesn't change any other behavior
- Allows runtime package installation via `uv pip install` to work
---
This is a Gittensor contribution.
gittensor:user:GlobalStar117
Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>
### What problem does this PR solve?
Add seekdb as doc_engine wich is the lite version of oceanbase.
close#12691
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Update answer concatenation logic to handle overlapping values
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The Node.js memory issue occurred due to JavaScript heap exhaustion
during the Vite build process sometimes. Here's what happened:
export NODE_OPTIONS="--max-old-space-size=4096" && \
Root Cause:
The Node.js memory issue occurred due to JavaScript heap exhaustion
during the Vite build process sometimes. Here's what happened:
Root Cause:
When building the web frontend with npm run build, Vite needs to bundle,
transform, and optimize all JavaScript/TypeScript code
Node.js has a default maximum heap size of ~2GB
The RAGFlow web application is large enough that the build process
exceeded this limit
This triggered garbage collection failures ("Ineffective mark-compacts
near heap limit") and eventually crashed with exit code 134 (SIGABRT)
The solution I attempted:
I did not find a simple method to reduce the use of memory for node.js,
so I added NODE_OPTIONS=--max-old-space-size=4096 to allocate 4GB heap
memory for Node.js during the build.
### Type of change
- Bug Fix (non-breaking change which fixes an issue)
=> ERROR [builder 6/8] RUN --mount=type=cache,id=ragflow_npm,target=/ro
53.3s
[builder 6/8] RUN
--mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked cd
web && npm install && npm run build:
4.551
4.551 > prepare
4.551 > cd .. && husky web/.husky
4.551
4.810 .git can't be found
4.833 added 7 packages in 4s
4.833
4.833 499 packages are looking for funding
4.833 run npm fund for details
5.206
5.206 > build
5.206 > vite build --mode production
5.206
5.939 vite v7.3.0 building client environment for production...
6.169 transforming...
6.472
6.472 WARN
6.472
6.472
6.472 WARN warn - As of Tailwind CSS v3.3, the @tailwindcss/line-clamp
plugin is now included by default.
6.472
6.472
6.472 WARN warn - Remove it from the plugins array in your configuration
to eliminate this warning.
6.472
53.14
53.14 <--- Last few GCs --->
53.14
53.14 [41:0x55f82d0] 47673 ms: Scavenge (reduce) 2041.5 (2086.0) ->
2038.7 (2079.7) MB, 6.11 / 0.00 ms (average mu = 0.330, current mu =
0.319) allocation failure;
53.14 [41:0x55f82d0] 47727 ms: Scavenge (reduce) 2039.4 (2079.7) ->
2038.7 (2080.2) MB, 5.34 / 0.00 ms (average mu = 0.330, current mu =
0.319) allocation failure;
53.14 [41:0x55f82d0] 47809 ms: Scavenge (reduce) 2039.6 (2080.2) ->
2038.7 (2080.2) MB, 4.59 / 0.00 ms (average mu = 0.330, current mu =
0.319) allocation failure;
53.14
53.14
53.14 <--- JS stacktrace --->
53.14
53.14 FATAL ERROR: Ineffective mark-compacts near heap limit Allocation
failed - JavaScript heap out of memory
53.14 ----- Native stack trace -----
53.14
53.14 1: 0xb76db1 node::OOMErrorHandler(char const*, v8::OOMDetails
const&) [node]
53.14 2: 0xee62f0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*,
char const*, v8::OOMDetails const&) [node]
53.14 3: 0xee65d7
v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char
const*, v8::OOMDetails const&) [node]
53.14 4: 0x10f82d5 [node]
53.14 5: 0x10f8864
v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector)
[node]
53.14 6: 0x110f754
v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector,
v8::internal::GarbageCollectionReason, char const*) [node]
53.14 7: 0x110ff6c
v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace,
v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
53.14 8: 0x11120ca v8::internal::Heap::HandleGCRequest() [node]
53.14 9: 0x107d737 v8::internal::StackGuard::HandleInterrupts() [node]
53.15 10: 0x151fb9a v8::internal::Runtime_StackGuard(int, unsigned
long*, v8::internal::Isolate*) [node]
53.15 11: 0x1959ef6 [node]
53.22 Aborted
[+] up 0/1
⠙ Image docker-ragflow Building 58.0s
Dockerfile:161
160 | COPY docs docs
161 | >>> RUN
--mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked \
162 | >>> cd web && npm install && npm run build
163 |
failed to solve: process "/bin/bash -c cd web && npm install && npm run
build" did not complete successfully: exit code: 134
View build details:
docker-desktop://dashboard/build/default/default/j68n2ke32cd8bte4y8fs471au
## Summary
Fixes#12631
When SQL query results contain NaN (Not a Number) or Infinity values
(e.g., from division by zero or other calculations), the JSON
serialization would fail because **NaN and Infinity are not valid JSON
values**.
This caused the agent interface to show 'undefined' error, as described
in the issue where `EXAMINE_TIMES` became `NaN` and broke the JSON
parsing.
## Root Cause
The `convert_decimals` function in `exesql.py` was only handling
`Decimal` types, but not `float` values that could be `NaN` or
`Infinity`.
When these invalid JSON values were serialized:
```json
{"EXAMINE_TIMES": NaN} // Invalid JSON!
```
The frontend JSON parser would fail, causing the 'undefined' error.
## Solution
Extended `convert_decimals` to detect `float` values and convert
`NaN`/`Infinity` to `null` before JSON serialization:
```python
if isinstance(obj, float):
if math.isnan(obj) or math.isinf(obj):
return None
return obj
```
This ensures all SQL results can be properly serialized to valid JSON.
---
This is a Gittensor contribution.
gittensor:user:GlobalStar117
Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
### What problem does this PR solve?
As title.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
In paragraph() of class FulltextQueryer, "len(keywords) / 10" should be
rounded to integer before set to minimum_should_match.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Problem
When database connection is lost, the reconnection logic had a bug: if
the first reconnect attempt failed, the second attempt was not wrapped
in error handling, causing unhandled exceptions.
## Solution
Added proper try-except blocks around the second reconnect attempt in
both MySQL and PostgreSQL database classes to ensure errors are properly
logged and handled.
## Changes
- Fixed `_handle_connection_loss()` in `RetryingPooledMySQLDatabase`
- Fixed `_handle_connection_loss()` in
`RetryingPooledPostgresqlDatabase`
Fixes#12294
---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=158349177
Co-authored-by: SID <158349177+0xsid0703@users.noreply.github.com>
### What problem does this PR solve?
```
$ python admin/client/ragflow_cli.py -t user -u aaa@aaa.com -p 9380
ragflow> list datasets;
ragflow> list default models;
ragflow> show version;
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
This PR extends the RAGFlow Admin API and CLI with comprehensive user
API token management capabilities. Administrators can now generate,
list, and delete API tokens for users through both the REST API and the
Admin CLI interface.
## Changes
### Backend API (`admin/server/`)
#### New Endpoints
- **POST `/api/v1/admin/users/<username>/new_token`** - Generate a new
API token for a user
- **GET `/api/v1/admin/users/<username>/token_list`** - List all API
tokens for a user
- **DELETE `/api/v1/admin/users/<username>/token/<token>`** - Delete a
specific API token for a user
#### Service Layer Updates (`services.py`)
- Added `get_user_api_key(username)` - Retrieves all API tokens for a
user
- Added `save_api_token(api_token)` - Saves a new API token to the
database
- Added `delete_api_token(username, token)` - Deletes an API token for a
user
### Admin CLI (`admin/client/`)
#### New Commands
- **`GENERATE TOKEN FOR USER <username>;`** - Generate a new API token
for the specified user
- **`LIST TOKENS OF <username>;`** - List all API tokens associated with
a user
- **`DROP TOKEN <token> OF <username>;`** - Delete a specific API token
for a user
### Testing
Added comprehensive test suite in `test/testcases/test_admin_api/`:
- **`test_generate_user_api_key.py`** - Tests for API token generation
- **`test_get_user_api_key.py`** - Tests for listing user API tokens
- **`test_delete_user_api_key.py`** - Tests for deleting API tokens
- **`conftest.py`** - Shared test fixtures and utilities
## Technical Details
### Token Generation
- Tokens are generated using `generate_confirmation_token()` utility
- Each token includes metadata: `tenant_id`, `token`, `beta`,
`create_time`, `create_date`
- Tokens are associated with user tenants automatically
### Security Considerations
- All endpoints require admin authentication (`@check_admin_auth`)
- Tokens are URL-encoded when passed in DELETE requests to handle
special characters
- Proper error handling for unauthorized access and missing resources
### API Response Format
All endpoints follow the standard RAGFlow response format:
```json
{
"code": 0,
"data": {...},
"message": "Success message"
}
```
## Files Changed
- `admin/client/admin_client.py` - CLI token management commands
- `admin/server/routes.py` - New API endpoints
- `admin/server/services.py` - Token management service methods
- `docs/guides/admin/admin_cli.md` - CLI documentation updates
- `test/testcases/test_admin_api/conftest.py` - Test fixtures
- `test/testcases/test_admin_api/test_user_api_key_management/*` - Test
suites
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Alexander Strasser <alexander.strasser@ondewo.com>
Co-authored-by: Hetavi Shah <your.email@example.com>
### What problem does this PR solve?
Skip duplicate errors to avoid 'create_idx' failures caused by slow
metadata refresh or external modifications.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes Infinity-specific API regressions: preserves ```important_kwd```
round‑trip for ```[""]```, restores required highlight key in retrieval
responses, and enforces Infinity guards for unsupported
```parser_id=tag``` and pagerank in ```/v1/kb/update```. Also removes a
slow/buggy pandas row-wise apply that was throwing ```ValueError``` and
causing flakiness.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
This commit fixes multiple issues preventing PDF Generator (Docs
Generator) output variables from being visible in the Output section and
available to downstream nodes.
### What problem does this PR solve?
Issues Fixed:
1. PDF Generator nodes initialized with empty object instead of proper
initial values
2. Output structure mismatch (had 'value' property that system doesn't
expect)
3. Missing 'download' output in form schema
4. Output list computed from static values instead of form state
5. Added null/undefined guard to transferOutputs function
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Changes:
- web/src/pages/agent/constant/index.tsx: Fixed output structure in
initialPDFGeneratorValues
- web/src/pages/agent/hooks/use-add-node.ts: Initialize PDF Generator
with proper values
- web/src/pages/agent/form/pdf-generator-form/index.tsx: Fixed schema
and use form.watch
- web/src/pages/agent/form/components/output.tsx: Added null guard and
spacing
### What problem does this PR solve?
Fix: In the agent loop, if the await response is selected as the
variable, the operator cannot be selected. #12656
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: duplicate content in chunk #12336
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes web API behavior mismatches that caused test failures by
normalizing error responses, tightening validations, correcting error
messages, and closing upload file handles.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix shell variable expansion to preserve $ in password defaults when
env vars are unset. Fixes Azure RDS auto-rotated passwords (that contain
$) being
truncated during template processing.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Modified and optimized the metadata condition card component.
Fix: Use startOfDay and endOfDay to ensure the date range includes a
full day.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Problem
The \`important_kwd\` field in Infinity connector was using mismatched
separators:
- **Storage**: \`list2str(v)\` uses space as default separator
- **Reading**: \`v.split()\` splits by all whitespace
This causes multi-word keywords like \`\"Senior Fund Manager\"\` to be
incorrectly split into \`[\"Senior\", \"Fund\", \"Manager\"]\`.
## Solution
Use comma \`,\` as separator for both storing and reading, consistent
with:
1. The LLM output format in \`keyword_prompt.md\` (\"delimited by
ENGLISH COMMA\")
2. The \`cached.split(\",\")\` in \`task_executor.py\`
## Changes
- \`insert()\`: \`list2str(v)\` → \`list2str(v, \",\")\`
- \`update()\`: \`list2str(v)\` → \`list2str(v, \",\")\`
- \`get_fields()\`: \`v.split()\` → \`v.split(\",\") if v else []\`
## Impact
This bug affects:
- Python-level reranking weight calculation (\`important_kwd * 5\`)
- API response keyword display
- Search precision due to fragmented keywords
### What problem does this PR solve?
Fix: Editing the agent greeting causes the greeting to be continuously
added to the message list. #12635
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
Fixes#12520 - Deleted chunks should not appear in retrieval/reference
results.
## Changes
### Core Fix
- **api/apps/chunk_app.py**: Include \doc_id\ in delete condition to
properly scope the delete operation
### Improved Error Handling
- **api/db/services/document_service.py**: Better separation of concerns
with individual try-catch blocks and proper logging for each cleanup
operation
### Doc Store Updates
- **rag/utils/es_conn.py**: Updated delete query construction to support
compound conditions
- **rag/utils/opensearch_conn.py**: Same updates for OpenSearch
compatibility
### Tests
- **test/testcases/.../test_retrieval_chunks.py**: Added
\TestDeletedChunksNotRetrievable\ class with regression tests
- **test/unit/test_delete_query_construction.py**: Unit tests for delete
query construction
## Testing
- Added regression tests that verify deleted chunks are not returned by
retrieval API
- Tests cover single chunk deletion and batch deletion scenarios
### What problem does this PR solve?
Fix regex pattern validation in split_with_pattern (#12605)
- Add try-except block to validate user-provided regex patterns before
use
- Gracefully fallback to single chunk when invalid regex is provided
- Prevent server crash during DOCX parsing with malformed delimiters
## Problem
Parsing DOCX files with custom regex delimiters crashes with `re.error:
nothing to repeat at position 9` when users provide invalid regex
patterns.
Closes#12605
## Solution
Validate and compile regex pattern before use. On invalid pattern, log
warning and return content as single chunk instead of crashing.
## Changes
- `rag/nlp/__init__.py`: Add regex validation in `split_with_pattern()`
function
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=42954461
### What problem does this PR solve?
Fixes#12570 - The slicing method dropdown was empty when deploying
RAGFlow v0.23.1 from source code.
The issue occurred because `parser_ids` from the tenant info was empty
or undefined, causing `useSelectParserList` to return an empty array.
This PR adds a fallback to a default parser list when `parser_ids` is
empty, ensuring the dropdown always has options.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=94194147
### What problem does this PR solve?
Feat: Hash doc id to avoid duplicate name.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Description
Fixes connection error handling when langfuse service is unavailable.
The application now gracefully handles connection failures instead of
crashing.
## Changes
- Wrapped `langfuse.auth_check()` calls in try-except blocks in:
- `api/db/services/dialog_service.py`
- `api/db/services/tenant_llm_service.py`
## Problem
When langfuse service is unavailable or connection is refused,
`langfuse.auth_check()` throws `httpx.ConnectError: [Errno 111]
Connection refused`, causing the application to crash during document
parsing or dialog operations.
## Solution
Added try-except blocks around `langfuse.auth_check()` calls to catch
connection errors and gracefully skip langfuse tracing instead of
crashing. The application continues functioning normally even when
langfuse is unavailable.
## Related Issue
Fixes#12621
---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=158349177
### What problem does this PR solve?
Fix: Fix the styles of the multi-select component and the filter pop-up.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes#12604 - DOCX files containing hyperlinks to internal bookmarks
(e.g., `#_文档目录`) cause a `KeyError` during parsing:
```
KeyError: "There is no item named 'word/#_文档目录' in the archive"
```
This happens because python-docx incorrectly tries to read internal
bookmark references as files from the ZIP archive. Internal bookmarks
are relationship targets starting with `#` and are not actual files.
This PR extends the existing `load_from_xml_v2` workaround (which
already handles `NULL` targets) to also skip relationship targets
starting with `#`.
Related upstream issue:
https://github.com/python-openxml/python-docx/issues/902
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=94194147