Commit Graph

1013 Commits

Author SHA1 Message Date
f98b24c9bf Move api.settings to common.settings (#11036)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 09:36:38 +08:00
17ea5c1dee Fix: MCP cannot handle empty Auth field properly (#11034)
### What problem does this PR solve?

Fix MCP cannot handle empty Auth field properly, then result in 

```bash
2025-11-05 11:10:41,919 INFO     51209 Negotiated protocol version: 2025-06-18
2025-11-05 11:10:41,920 INFO     51209 client_session initialized successfully
2025-11-05 11:10:41,994 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:10:41] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:10:41,999 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:10:42,000 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:10:42,001 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:10:42] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:11:30,441 INFO     51209 Negotiated protocol version: 2025-06-18
2025-11-05 11:11:30,442 INFO     51209 client_session initialized successfully
2025-11-05 11:11:30,520 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:30] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:11:30,525 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:11:30,526 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:11:30,527 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:30] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:11:31,476 INFO     51209 Negotiated protocol version: 2025-06-18
2025-11-05 11:11:31,476 INFO     51209 client_session initialized successfully
2025-11-05 11:11:31,549 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:31] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:11:31,552 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:11:31,553 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:11:31,554 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:31] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:11:51,930 ERROR    51209 unhandled errors in a TaskGroup (1 sub-exception)
  + Exception Group Traceback (most recent call last):
  |   File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 86, in _mcp_server_loop
  |     async with streamablehttp_client(url, headers) as (read_stream, write_stream, _):
  |   File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 217, in __aexit__
  |     await self.gen.athrow(typ, value, traceback)
  |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/mcp/client/streamable_http.py", line 478, in streamablehttp_client
  |     async with anyio.create_task_group() as tg:
  |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 781, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/mcp/client/streamable_http.py", line 409, in handle_request_async
    |     await self._handle_post_request(ctx)
    |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/mcp/client/streamable_http.py", line 278, in _handle_post_request
    |     response.raise_for_status()
    |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/httpx/_models.py", line 829, in raise_for_status
    |     raise HTTPStatusError(message, request=request, response=self)
    | httpx.HTTPStatusError: Server error '502 Bad Gateway' for url 'http://192.168.1.38:9382/mcp'
    | For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/502
    +------------------------------------
2025-11-05 11:11:51,942 ERROR    51209 Error fetching tools from MCP server: streamable-http: http://192.168.1.38:9382/mcp
Traceback (most recent call last):
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 168, in get_tools
    return future.result(timeout=timeout)
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._get_tools_from_mcp_server) at 0x7d58f02e2c20>", line 40, in _get_tools_from_mcp_server
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 160, in _get_tools_from_mcp_server
    result: ListToolsResult = await self._call_mcp_server("list_tools", timeout=timeout)
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._call_mcp_server) at 0x7d58f02e2b00>", line 63, in _call_mcp_server
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 139, in _call_mcp_server
    raise result
ValueError: Connection failed (possibly due to auth error). Please check authentication settings first
2025-11-05 11:11:51,943 ERROR    51209 Test MCP error: Connection failed (possibly due to auth error). Please check authentication settings first
Traceback (most recent call last):
  File "/home/xxxxxxxxx/workspace/ragflow/api/apps/mcp_server_app.py", line 429, in test_mcp
    tools = tool_call_session.get_tools(timeout)
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession.get_tools) at 0x7d58f02e2cb0>", line 40, in get_tools
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 168, in get_tools
    return future.result(timeout=timeout)
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._get_tools_from_mcp_server) at 0x7d58f02e2c20>", line 40, in _get_tools_from_mcp_server
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 160, in _get_tools_from_mcp_server
    result: ListToolsResult = await self._call_mcp_server("list_tools", timeout=timeout)
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._call_mcp_server) at 0x7d58f02e2b00>", line 63, in _call_mcp_server
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 139, in _call_mcp_server
    raise result
ValueError: Connection failed (possibly due to auth error). Please check authentication settings first
2025-11-05 11:11:51,944 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:11:51,945 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:11:51,946 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:51] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:12:20,484 INFO     51209 Negotiated protocol version: 2025-06-18
2025-11-05 11:12:20,485 INFO     51209 client_session initialized successfully
2025-11-05 11:12:20,570 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:12:20] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:12:20,573 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:12:20,574 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:12:20,575 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:12:20] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:15:02,119 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:15:02] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:16:24,967 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:16:24] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:30:24,284 ERROR    51209 Task was destroyed but it is pending!
task: <Task pending name='Task-58' coro=<MCPToolCallSession._mcp_server_loop() running at <@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._mcp_server_loop) at 0x7d58f02e29e0>:11> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_chain_future.<locals>._call_set_state() at /home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/futures.py:392]>
2025-11-05 11:30:24,285 ERROR    51209 Task was destroyed but it is pending!
task: <Task pending name='Task-67' coro=<Queue.get() running at /home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/queues.py:159> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_release_waiter(<Future pendi...ask_wakeup()]>)() at /home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/tasks.py:387]>
Exception ignored in: <coroutine object Queue.get at 0x7d585480ace0>
Traceback (most recent call last):
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/queues.py", line 161, in get
    getter.cancel()  # Just in case getter is not done yet.
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-05 19:15:27 +08:00
121c51661d Fix: Markdown table extractor (#11018)
### What problem does this PR solve?

Now markdown table extractor supports <table ...>. #10966 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-05 16:10:21 +08:00
02d10f8eda Move var from rag.settings to common.globals (#11022)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 15:48:50 +08:00
dddf766470 Feat: start data sync service. (#11026)
### What problem does this PR solve?

#10953 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-05 15:43:15 +08:00
b86e07088b Fix: escape multi-steps issues. (#11016)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-05 14:51:00 +08:00
1a9215bc6f Move some vars to globals (#11017)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 14:14:38 +08:00
cf9611c96f Feat: Support more chunking methods (#11000)
### What problem does this PR solve?

Feat: Support more chunking methods #10772 

This PR enables multiple chunking methods — including books, laws,
naive, one, and presentation — to be used with all existing PDF parsers
(DeepDOC, MinerU, Docling, TCADP, Plain Text, and Vision modes).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-05 13:00:42 +08:00
96c015fb85 Fix and refactor imports (#11010)
### What problem does this PR solve?

1. Move EMBEDDING_CFG to common.globals
2. Fix error imports
3. Move signal handles to common/signal_utils.py

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 11:07:54 +08:00
bab3fce136 Move some constants to common (#11004)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 08:01:39 +08:00
4bbbf92331 Refa: link connector to KB. (#10991)
### What problem does this PR solve?

#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-04 20:13:52 +08:00
880a6a0428 Move some enumerate type to constants.py (#10998)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-04 19:25:25 +08:00
465a140727 Feat: refine Confluence connector (#10994)
### What problem does this PR solve?

Refine Confluence connector.
#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2025-11-04 17:29:11 +08:00
2677617f93 Feat: supports MinerU http-client/server method (#10961)
### What problem does this PR solve?

Add support for MinerU http-client/server method.

To use MinerU with vLLM server:

1. Set up a vLLM server running MinerU:
   ```bash
   mineru-vllm-server --port 30000
   ```

2. Configure the following environment variables:
- `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru` (or the path to
your MinerU executable)
   - `MINERU_BACKEND="vlm-http-client"`
   - `MINERU_SERVER_URL="http://your-vllm-server-ip:30000"`

3. Follow the standard MinerU setup steps as described above.

With this configuration, RAGFlow will connect to your vLLM server to
perform document parsing, which can significantly improve parsing
performance for complex documents while reducing the resource
requirements on your RAGFlow server.



![1](https://github.com/user-attachments/assets/46624a0c-0f3b-423e-ace8-81801e97a27d)

![2](https://github.com/user-attachments/assets/66ccc004-a598-47d4-93cb-fe176834f83b)


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
2025-11-04 16:03:30 +08:00
16d2be623c Minor tweaks (#10987)
### What problem does this PR solve?

1. Rename identifier name
2. Fix some return statement
3. Fix some typos

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-04 14:15:31 +08:00
1e45137284 Move 'timeout' to common folder (#10983)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-04 11:51:12 +08:00
c20f5675c6 Fix: elasticsearch connection hardcoded (#10975)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/10930

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-04 10:59:35 +08:00
378bdfccfc Refactor log utils (#10973)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 20:25:02 +08:00
3e5a39482e Feat: Support multiple data sources synchronizations (#10954)
### What problem does this PR solve?
#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-03 19:59:18 +08:00
9a486e0f51 Move some funcs from api to rag module (#10972)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 19:26:09 +08:00
fd4aa79c07 Fix:missing embedding vector on Tokenizer (#10964)
### What problem does this PR solve?
issue:
[#10890](https://github.com/infiniflow/ragflow/issues/10890)
change:
missing embedding vector on Tokenizer
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-03 19:17:05 +08:00
2d83c64eed Fix:wrong describe_with_prompt() in ollama (#10963)
### What problem does this PR solve?

change:
wrong describe_with_prompt() in ollama

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-03 19:16:41 +08:00
076d811086 Introduce common/config_utils.py (#10968)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 17:25:06 +08:00
d008a4df9f Move base64_image related functions to common directory (#10957)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 15:20:46 +08:00
78631a3fd3 Move some functions out of 'api/utils/common.py' (#10948)
### What problem does this PR solve?

as title.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 12:34:47 +08:00
4117f41758 Fix: decode error in email parser app (#10920)
### What problem does this PR solve?

Fix: UnicodeDecodeError: 'gb2312' codec can't decode byte 0xab in
position 560: illegal multibyte sequence.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-03 12:31:06 +08:00
061d8f78e5 Feat: location rule for http (#10901)
### What problem does this PR solve?

Location rule for http.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-03 11:01:24 +08:00
33371cda11 Fix:output_structure in agent (#10907)
### What problem does this PR solve?
change:
output_structure in agent

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-03 09:39:53 +08:00
fa210e7c58 Feat: parsing hyperlinks in docx and pdf & Fix: default parser config of toc extraction (#10877)
### What problem does this PR solve?

Feat: parsing hyperlinks in docx and pdf #10848
Fix: default parser config of toc extraction

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-03 09:34:12 +08:00
360f5c1179 Move token related functions to common (#10942)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 08:50:05 +08:00
44f2d6f5da Move 'get_project_base_directory' to common directory (#10940)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-02 21:05:28 +08:00
6447b737ab Move singleton to common directory (#10935)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-02 12:24:08 +08:00
fe4852cb71 TEI auto truncate inputs (#10916)
### What problem does this PR solve?

TEI auto truncate inputs

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-31 16:46:20 +08:00
f52e56c2d6 Remove 'get_lan_ip' and add common misc_utils.py (#10880)
### What problem does this PR solve?

Add get_uuid, download_img and hash_str2int into misc_utils.py

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-31 16:42:01 +08:00
0ecccd27eb Refactor:improve the logic for rerank models to cal the total token count (#10882)
### What problem does this PR solve?

improve the logic for rerank models to cal the total token count

### Type of change

- [x] Refactoring
2025-10-31 09:46:16 +08:00
ab52ffc9c0 Fix: law parser (#10897)
### What problem does this PR solve?

Fix: law parser  #10888

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 19:00:11 +08:00
5f65c7f48e Fix: video parser should follow selected VLM in pipeline (#10900)
### What problem does this PR solve?

Video parser should follow selected VLM, rather than default one.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 17:59:50 +08:00
bb9504d1cc Fix:enhance delimiters in markdown parser (#10896)
### What problem does this PR solve?
issue:
[#10890](https://github.com/infiniflow/ragflow/issues/10890)
change:
enhance delimiters in markdown parser
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 17:36:51 +08:00
b52f09adfe Mineru api support (#10874)
### What problem does this PR solve?

support local mineru api in docker instance. like no gpu in wsl on
windows, but has mineru api with gpu support.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-10-30 17:31:46 +08:00
27f0d82102 Fix: opensearch retrieval error (#10891)
### What problem does this PR solve?

Fix: opensearch retrieval error #10828

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 17:30:54 +08:00
a3bb4aadcc Fix: predictable token generation (#10868)
### What problem does this PR solve?

Fix predictable token generation.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 09:31:36 +08:00
c0c2a10680 Feat: allow initialize Redis without password (#10856)
### What problem does this PR solve?

Allow initialize Redis without password.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-29 09:45:28 +08:00
d86d7061ea Refactor: Improve how to get total token count for AnthropicCV (#10658)
### What problem does this PR solve?

 Improve how to get total token count for AnthropicCV

### Type of change

- [x] Refactoring
2025-10-29 09:41:15 +08:00
84d1ffe44c Feature/add new models for token pony and bug fix for use llm (#10823)
new models for token pony and bug fix for use llm

Co-authored-by: huangzl <huangzl@shinemo.com>
2025-10-28 10:04:41 +08:00
766d900a41 Refactor: rename rmSpace to remove_redundant_spaces (#10796)
### What problem does this PR solve?

- rename rmSpace to remove_redundant_spaces
- move clean_markdown_block to common module
- add unit tests for remove_redundant_spaces and clean_markdown_block

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-28 09:46:32 +08:00
e59458c36b Fix: parsing excel with chartsheet & Clamp begin to a minimum of 0 to prevent negative indexing (#10819)
### What problem does this PR solve?

Fix: parsing excel with chartsheet #10815

Fix: Clamp begin to a minimum of 0 to prevent negative indexing #10804
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-28 09:40:37 +08:00
5acc407240 Feat: MinerU supports VLM-Transfomers backend (#10809)
### What problem does this PR solve?

MinerU supports VLM-Transfomers backend.

Set `MINERU_BACKEND="pipeline"` to choose the backend. (Options:
pipeline | vlm-transformers, default is pipeline)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-10-27 17:04:13 +08:00
33a189f620 Feat: add TCADP Parser (#10775)
### What problem does this PR solve?

This PR adds a new TCADP (Tencent Cloud Advanced Document Processing)
parser to RAGFlow, enabling users to leverage Tencent Cloud's document
parsing capabilities for more accurate and structured document
processing. The implementation includes:
New TCADP Parser: A complete implementation of Tencent Cloud's document
parsing API without SDK dependency
Configuration Support: Added configuration options in service_conf.yaml
for Tencent Cloud API credentials
Frontend Integration: Updated UI components to support the new TCADP
parser option
Error Handling: Comprehensive error handling and retry mechanisms for
API calls
Result Processing: Support for both SSE streaming and JSON response
formats from Tencent Cloud API

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-10-27 15:14:58 +08:00
56def59c2b Fix:Error retrieving DOCX image (docx.image.exceptions.UnrecognizedImageError) (#10794)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/10776

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-10-27 13:23:16 +08:00
3bd0b99495 Fix: gemini cv model chat issue. (#10799)
### What problem does this PR solve?

#10787
#10781

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-27 11:43:56 +08:00