Commit Graph

2818 Commits

Author SHA1 Message Date
67dee2d74e Fix: fix retrieval tesing wrong pagination (#7174)
### What problem does this PR solve?

Fix retrieval testing wrong pagination. #7171 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-22 15:16:04 +08:00
bcac195a0c Put the knowledge base list related hooks into use-knowledge-request.ts #3221 (#7197)
### What problem does this PR solve?

Put the knowledge base list related hooks into use-knowledge-request.ts
#3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 15:01:35 +08:00
8fca8faa7d Feat: Move langfuse configuration to api page #6155 (#7196)
### What problem does this PR solve?

Feat: Move langfuse configuration to api page #6155

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 14:08:20 +08:00
1cc17eb611 Feat: Filter the knowledge base list using owner #3221 (#7191)
### What problem does this PR solve?

Feat: Filter the knowledge base list using owner #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 13:44:41 +08:00
c8194f5fd0 refactor: Update Redis configuration to use StatefulSet instead of deployment with pvc (#7187)
### What problem does this PR solve?

This PR changes Redis to be a statefulset. In some situation when we
Redis pod gets rescheduled to another Node, it gets stuck in pending
state due to the PVC attached to another Kubernetes node.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [X] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-22 12:53:30 +08:00
f2c9ffc056 Fix: KG search issue. (#7186)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-22 12:10:30 +08:00
10432a1be7 Refa: Optimize pptx shape extraction to reduce content loss (#6703)
### What problem does this PR solve?

When parsing pptx files, some shapes do not contain the `shape_type`
attribute, which causes the original code to throw an exception during
extraction, leading to failure in content extraction. This optimization
introduces handling logic for such anomalous shapes, providing a safer
and more robust processing mechanism.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
2025-04-22 10:16:24 +08:00
e7f83b13ca Feat: Rename a dataset #3221 (#7162)
### What problem does this PR solve?

Feat: Rename a dataset #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 10:09:41 +08:00
ad220a0a3c Feat: add mcp self-host mode (#7157)
### What problem does this PR solve?

Add mcp self-host mode, a complement of #7084.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-22 10:04:21 +08:00
91c5a5c08f Docs: add mcp self-host mode (#7163)
### What problem does this PR solve?

Add mcp self-host mode documentation, a complement of #7141.

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-04-22 10:03:38 +08:00
8362ab405c Fix: don't modify S3 file name when not using prefix_path (#7152)
### What problem does this PR solve?

Hello, I encountered a problem when trying to use a S3 backend
(seaweedfs) for storage in RAGFlow: when calling
`STORAGE_IMPL.get("bucket", "key")`, the actual request sent to S3 is
`bucket/bucket/key`, causing a `NoSuchKey` error.

I compared the code in `s3_conn.py` to `minio_conn.py` and
`oss_conn.py`, then decided to remove the `else` branch in
`use_prefix_path` method, and it works. I didn't configure `prefix_path`
or `bucket` in `s3` section of the `service_conf.yaml`.

I think this is a bug, but not sure.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-21 11:55:50 +08:00
68b9dae6c0 Feat: mcp server (#7084)
### What problem does this PR solve?

Add MCP support with a client example.

Issue link: #4344

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-21 09:43:20 +08:00
9b956ac1a9 Docs: MCP server (#7141)
### What problem does this PR solve?

Documentation for MCP server

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-04-21 09:42:32 +08:00
d4dbdfb61d feat: Recover pending tasks while pod restart. (#7073)
### What problem does this PR solve?

If you deploy Ragflow using Kubernetes, the hostname will change during
a rolling update. This causes the consumer name of the task executor to
change, making it impossible to schedule tasks that were previously in a
pending state.
To address this, I introduced a recovery task that scans these pending
messages and re-publishes them, allowing the tasks to continue being
processed.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
2025-04-19 16:18:51 +08:00
487aed419e Fix: cite disfunction for G component. (#7117)
### What problem does this PR solve?

#7097

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-18 18:05:26 +08:00
8b8a2f2949 fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106)
## Problem Description
Multiple files in the RAGFlow project contain closure trap issues when
using lambda functions with `trio.open_nursery()`. This problem causes
concurrent tasks created in loops to reference the same variable,
resulting in all tasks processing the same data (the data from the last
iteration) rather than each task processing its corresponding data from
the loop.

## Issue Details
When using a `lambda` to create a closure function and passing it to
`nursery.start_soon()` within a loop, the lambda function captures a
reference to the loop variable rather than its value. For example:

```python
# Problematic code
async with trio.open_nursery() as nursery:
    for d in docs:
        nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn))
```

In this pattern, when concurrent tasks begin execution, `d` has already
become the value after the loop ends (typically the last element),
causing all tasks to use the same data.

## Fix Solution
Changed the way concurrent tasks are created with `nursery.start_soon()`
by leveraging Trio's API design to directly pass the function and its
arguments separately:

```python
# Fixed code
async with trio.open_nursery() as nursery:
    for d in docs:
        nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn)
```

This way, each task uses the parameter values at the time of the
function call, rather than references captured through closures.

## Fixed Files
Fixed closure traps in the following files:

1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword
extraction, question generation, and tag processing
2. `rag/raptor.py`: 1 fix, involving document summarization
3. `graphrag/utils.py`: 2 fixes, involving graph node and edge
processing
4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution
and graph node merging
5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document
processing
6. `graphrag/general/extractor.py`: 3 fixes, involving content
processing and graph node/edge merging
7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving
community report extraction

## Potential Impact
This fix resolves a serious concurrency issue that could have caused:
- Data processing errors (processing duplicate data)
- Performance degradation (all tasks working on the same data)
- Inconsistent results (some data not being processed)

After the fix, all concurrent tasks should correctly process their
respective data, improving system correctness and reliability.
2025-04-18 18:00:20 +08:00
42e236f464 Feat: Rendering a search test list with real data #3221 (#7138)
### What problem does this PR solve?

Feat: Rendering a search test list with real data #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-18 16:29:41 +08:00
1b4016317e fix bug chunking:expected string or bytes-like object (#7116)
… bytes-like object

### What problem does this PR solve?
fix bug #6990 internal server error ehile chunking:expected string or
bytes-like object
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: unknown <taoshi.ln@chinatelecom.cn>
2025-04-18 14:42:36 +08:00
b1798bafb0 Fix: handle sometimes graph index will miss explanation (#7127)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7053

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-18 14:24:36 +08:00
86f76df586 Feat: Retrieval test #3221 (#7121)
### What problem does this PR solve?

Feat: Retrieval test #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-17 19:03:55 +08:00
db82c15de4 Fix: wrong “available” property when list chunk (#7093)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7083

Internal due to when returning from ES, fields changed to str, so the
bool conversion does not work as expected.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-17 17:17:35 +08:00
627fd002ae Update utils.py (#7091)
### What problem does this PR solve?

when there are multiple entities, the variable `v` may be a list, which
will lead to this error:
```
| File "/mnt/d/wrf/ragflow/ragflow/graphrag/utils.py", line 59, in replace_all
| result = result.replace(f"{{{k}}}", v)
| TypeError: replace() argument 2 must be str, not list
```
this pr assign this `v` to be a str

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-17 17:17:09 +08:00
9e7d052c8d Fix: knowledge graph resolution with infinity raise error tokenizing in specific situations (#7048)
### What problem does this PR solve?

When running graph resolution with infinity, if single quotation marks
appeared in the entities name that to be delete, an error tokenizing of
sqlglot might occur after calling infinity.

For example:
```
INFINITY delete table ragflow_xxx, filter knowledge_graph_kwd IN ('entity') AND entity_kwd IN ('86 IMAGES FROM PREVIOUS CONTESTS', 'ADAM OPTIMIZATION', 'BACKGROUND'ESTIMATION')
```
may raise error
```
Error tokenizing 'TS', 'ADAM OPTIMIZATION', 'BACKGROUND'ESTIMATION''
```
and make the document parsing failed。

Replace one single quotation mark with double single quotation marks can
let sqlglot tokenize the entity name correctly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-17 16:15:21 +08:00
d9927f5185 Fix: Error in sending placeholder words in Chinese and Chinese-Traditional (#7094)
### What problem does this PR solve?

The assistant message placeholder is incorrect, I have finished
modifying both Chinese and traditional Chinese characters

### Type of change


- [x] Bug Fix
2025-04-17 15:52:03 +08:00
5d253e0a34 Fix: pymysql.err.InterfaceError: (0, '') during long time streaming chat responses (#6548) (#7057)
### Related Issue:
https://github.com/infiniflow/ragflow/issues/6548

### Related PR:
https://github.com/infiniflow/ragflow/pull/6861


### Environment:
Commit version:
[[48730e0](48730e00a8)]

### Bug Description:
Unexpected `pymysql.err.InterfaceError: (0, '') `when using Peewee +
PyMySQL + PooledMySQLDatabase after a long-running `chat streamly`
operation.

This is a common issue with Peewee + PyMySQL + connection pooling: you
end up using a connection that was silently closed by the server, but
Peewee doesn't realize it's dead.

**I found that the error only occurs during longer streaming outputs**
and is unrelated to the database connection context, so it's likely
because:

- The prolonged streaming response caused the database connection to
time out

- The original database connection might have been disconnected by the
server during the streaming process

### Why This Happens
This error happens even when using `@DB.connection_context() `after the
stream is done. After investigation, I found this is caused by MySQL
connection pools that appear to be open but are actually dead (expired
due to` wait_timeout`).

1. `@DB.connection_context()` (as a decorator or context manager) pulls
a connection from the pool.

2. If this connection was idle and expired on the MySQL server (e.g.,
due to `wait_timeout`), but not closed in Python, it will still be
considered “open” (`DB.is_closed() == False`).

3. The real error will occur only when I execute a SQL command (such as
.`get_or_none()`), and PyMySQL tries to send it to the server via a
broken socket.


### Changes Made:

1. I implemented manual connection checks before executing SQL:
```
    try:
        DB.execute_sql("SELECT 1")
    except Exception:
        print("Connection dead, reconnecting...")
        DB.close()
        DB.connect()
```
2. Delayed the token count update until after the streaming response is
completed to ensure the streaming output isn't interrupted by database
operations.
```
        total_tokens = 0 
        for txt in chat_streamly(system, history, gen_conf):
            if isinstance(txt, int):
                total_tokens = txt
......
                break
......
        if total_tokens > 0:
            if not TenantLLMService.increase_usage(self.tenant_id, self.llm_type, txt, self.llm_name):
                logging.error("LLMBundle.chat_streamly can't update token usage for {}/CHAT llm_name: {}, content: {}".format(self.tenant_id, self.llm_name, txt))
```
2025-04-16 19:15:35 +08:00
de5727f90a Fix: Files being parsed are not allowed to be deleted in batches #7065 (#7066)
### What problem does this PR solve?

Fix: Files being parsed are not allowed to be deleted in batches #7065

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-16 16:46:24 +08:00
9c2dd70839 Miscellaneous editorial updates. (#7047)
### What problem does this PR solve?

#6910 

### Type of change

- [x] Documentation Update
2025-04-16 10:31:10 +08:00
e0e78112a2 Docs: Change DELETE to POST in Related Questions curl example (#7054)
### What problem does this PR solve?

docs(api): Fix request method in Related Questions example (DELETE→POST)

### Type of change

- [x] Documentation Update
2025-04-16 10:29:59 +08:00
48730e00a8 Docs: updates. (#7042)
### What problem does this PR solve?

#7019

### Type of change

- [x] Documentation Update
2025-04-15 17:45:52 +08:00
e5f9d148e7 Test: Added test cases for Delete Sessions With Chat Assistant HTTP API (#7025)
### What problem does this PR solve?

cover [Delete chat assistant's
sessions](https://ragflow.io/docs/dev/http_api_reference#delete-chat-assistants-sessions)
endpoints

### Type of change

- [x] Add test cases
2025-04-15 14:54:26 +08:00
f6b280e372 Fix: when remove document do not delete the file in storage if the source is not knowledge base (#7005)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/6905

When deleting a document will check before removing it from storage

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-15 12:11:41 +08:00
5af2d57086 Refa. (#7022)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-04-15 10:20:33 +08:00
7a34159737 Fix: add fallback for bad citation output (#7014)
### What problem does this PR solve?

Add fallback for bad citation output. #6948

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-15 09:33:53 +08:00
b1fa5a0754 Fix Helm Ingress template (#7018)
### What problem does this PR solve?

Fix Helm Ingress template; Trying to access a global variable within a
loop
Fix #6191

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-15 09:19:37 +08:00
018ff4dd0a Refa: update llms (#7007)
### What problem does this PR solve?

Update LLM models

### Type of change

- [x] Refactoring
2025-04-15 09:19:07 +08:00
ed352710ec Feat: Remove the rotation state of the button that parses the document #7008 (#7009)
### What problem does this PR solve?

Feat: Remove the rotation state of the button that parses the document
#7008
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-04-14 18:50:11 +08:00
0a0c1edce3 Docs: readme updating. (#7002)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-04-14 14:45:37 +08:00
18eb76f6b8 Fix: The selected state of the TreeView node cannot be seen on Mac #7000 (#7001)
### What problem does this PR solve?

Fix: The selected state of the TreeView node cannot be seen on Mac #7000

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-14 14:23:26 +08:00
ed5f81b02e Fix: abnormal cell mergeing. (#6991)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-14 11:00:11 +08:00
23c5ce48d1 Fix update_progress issue (#6992)
### What problem does this PR solve?

Fix update_progress issue introduced by #6975 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-14 10:23:13 +08:00
de766ba628 Fix: Fix api page translation issue. #3221 (#6993)
### What problem does this PR solve?

Fix: Fix api page translation issue. #3221

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-14 10:23:00 +08:00
5aae73c230 Make error messages during PPT processing clearer. (#6980)
### What problem does this PR solve?

Sometimes a slide may trigger a Proxy error (ArgumentException:
Parameter is not valid) due to issues in the original file, and this
error message can be confusing for users.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe):
2025-04-14 10:10:20 +08:00
b578451e6a docs: update Docker build commands to specify platform as linux/amd64 (#6977)
### What problem does this PR solve?

Considering the ragflow_deps image is only available for `linux/amd64`
platform, if we try to run the docker build commands in ,macOS for
instance, without the platform flag, we get an error due to the
different platform. Specifying the platform in the docker build command
fixes this issue.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [X] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-14 10:07:39 +08:00
53c653b099 fix RAGFlowPdfParser AttributeError: 'PdfReader' object has no attribute 'close' err (#6859)
i use PdfParser in local(refer to this case:
https://github.com/infiniflow/ragflow/blob/main/rag/app/paper.py) like
this:
```
import re
import openpyxl

from ragflow.api.db import ParserType
from ragflow.rag.nlp import rag_tokenizer, tokenize, tokenize_table, add_positions, bullets_category, \
    title_frequency, \
    tokenize_chunks
from ragflow.rag.utils import num_tokens_from_string
from ragflow.deepdoc.parser import PdfParser, ExcelParser, DocxParser,PlainParser


def logger(prog=None, msg=""):
    print(msg)


class Pdf(PdfParser):
    def __init__(self):
        self.model_speciess = ParserType.MANUAL.value
        super().__init__()

    def __call__(self, filename, binary=None, from_page=0,
                 to_page=100000, zoomin=3, callback=None):
        from timeit import default_timer as timer
        start = timer()
        callback(msg="OCR is running...")

        self.__images__(
            filename if not binary else binary,
            zoomin,
            from_page,
            to_page,
            callback
        )
        callback(msg="OCR finished.")
        print("OCR:", timer() - start)
   
        self._layouts_rec(zoomin)
        callback(0.65, "Layout analysis finished.")
        print("layouts:", timer() - start)

        self._table_transformer_job(zoomin)
        callback(0.67, "Table analysis finished.")


        self._text_merge()
        tbls = self._extract_table_figure(True, zoomin, True, True)
        self._concat_downward()  
        self._filter_forpages()   
        callback(0.68, "Text merging finished")

        # clean mess
        for b in self.boxes:
            b["text"] = re.sub(r"([\t  ]|\u3000){2,}", " ", b["text"].strip())

        return [(b["text"], b.get("layout_no", ""), self.get_position(b, zoomin))
                for i, b in enumerate(self.boxes)], tbls


```

show err like this:
```
  File "xxxxx/third_party/ragflow/deepdoc/parser/pdf_parser.py", line 1039, in __images__
    self.pdf.close()
AttributeError: 'PdfReader' object has no attribute 'close'
```

i found ragflow source code use
`pdfplumber.open`(https://github.com/infiniflow/ragflow/blob/main/deepdoc/parser/pdf_parser.py#L1007C28-L1007C43)

and replace` self.pdf `with ` pdf2_read` (from pypdf import PdfReader as
pdf2_read)in line 1024
(https://github.com/infiniflow/ragflow/blob/main/deepdoc/parser/pdf_parser.py#L1024)
```
self.pdf = pdf2_read
```


---
and I found that `pdfplumber` can be used in this way:
```
file_path="xxx.pdf"
res = pdfplumber.open(file_path)
res.close()
```

but `pypdf.PdfReader` source code do not has `close` func, source code
use like this

```
 with open(stream, "rb") as fh:
         stream = BytesIO(fh.read())
          self._stream_opened = True
```
> https://github.com/py-pdf/pypdf/blob/main/pypdf/_reader.py#L156

so I moved the `self.pdf.close` function call and fixed this problem
hoping to help the project😊
2025-04-14 09:40:13 +08:00
b70abe52b2 Fix: Ensure lock is released in update_progress using context manager (#6975)
ragflow: v0.17 also encountered this problem. #1453 The task table shows
that the actual task has been completed. Since the process_msg of the
task is not synchronized to the document table, there is no progress
update on the page.
This may be caused by the lock not being released when the exception
occurs.

ragflow:v0.17同样碰到这个问题, 看task表实际任务已经完成,由于没有把task的process_msg同步给document表,
所以在页面看没有进度更新。
可能是这里异常时没有释放锁导致的。

```/api/ragflow_server.py
def update_progress():
    lock_value = str(uuid.uuid4())
    redis_lock = RedisDistributedLock("update_progress", lock_value=lock_value, timeout=60)
    logging.info(f"update_progress lock_value: {lock_value}")
    while not stop_event.is_set():
        try:
            if redis_lock.acquire():
                DocumentService.update_progress()
                redis_lock.release()
            stop_event.wait(6)
        except Exception:
            logging.exception("update_progress exception")
++       if redis_lock.acquired:
++               redis_lock.release()
```
2025-04-11 20:46:19 +08:00
98670c3755 Fix: KB update_time changed whenever system relaunched (#6959)
### What problem does this PR solve?

Fix KB update_time changed whenever system relaunched. #6953 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-11 20:10:49 +08:00
9b789c2ae9 Test: Added test cases for Update Session With Chat Assistant HTTP API (#6968)
### What problem does this PR solve?

cover [Update chat assistant's
sessions](https://ragflow.io/docs/dev/http_api_reference#update-chat-assistants-session)
endpoints

### Type of change

- [x] Update test cases
2025-04-11 20:10:24 +08:00
ffb9f01bea Test: Update test cases for PR 6906 ISSUE 6875 (#6971)
### What problem does this PR solve?

PR #6906 ISSUE #6875

### Type of change

- [ ] Update test cases
2025-04-11 20:09:44 +08:00
ed7244f5f5 Fix: Delete unused pages (#6973)
### What problem does this PR solve?

Fix: Delete unused pages

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-11 20:06:58 +08:00
e54c0e39b5 fix bug [ERROR][Exception]: 8 vs. 9 (#6955)
### What problem does this PR solve?

Sometimes, the **s** in **chunks (s, a)** is an empty string. This
causes the condition **if s and len(a) > 0** in the line **chunks = [(s,
a) for s, a in chunks if s and len(a) > 0]** to fail, which changes the
length of the new chunks. As a result, the final assertion **assert
len(chunks) - end == n_clusters, "{} vs. {}".format(len(chunks) - end,
n_clusters)** fails and raises a confusing error like 7 vs. 8

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-11 17:01:49 +08:00