Added a guide on setting page rank. (#6645)

### What problem does this PR solve?


### Type of change


- [x] Documentation Update

---------

Co-authored-by: balibabu <cike8899@users.noreply.github.com>
This commit is contained in:
writinwaters
2025-03-31 11:44:18 +08:00
committed by GitHub
parent 805a8f1f47
commit 2793c8e4fe
12 changed files with 123 additions and 121 deletions

View File

@ -131,25 +131,17 @@ Yes, we support enhancing user queries based on existing context of an ongoing c
When debugging your chat assistant, you can use AI search as a reference to verify your model settings and retrieval strategy.
---
## Troubleshooting
---
### Issues with Docker images
---
#### How to build the RAGFlow image from scratch?
### How to build the RAGFlow image from scratch?
See [Build a RAGFlow Docker image](./develop/build_docker_image.mdx).
---
### Issues with huggingface models
---
#### Cannot access https://huggingface.co
### Cannot access https://huggingface.co
A locally deployed RAGflow downloads OCR and embedding modules from [Huggingface website](https://huggingface.co) by default. If your machine is unable to access this site, the following error occurs and PDF parsing fails:
@ -180,7 +172,7 @@ To fix this issue, use https://hf-mirror.com instead:
---
#### `MaxRetryError: HTTPSConnectionPool(host='hf-mirror.com', port=443)`
### `MaxRetryError: HTTPSConnectionPool(host='hf-mirror.com', port=443)`
This error suggests that you do not have Internet access or are unable to connect to hf-mirror.com. Try the following:
@ -193,17 +185,13 @@ This error suggests that you do not have Internet access or are unable to connec
---
### Issues with RAGFlow servers
---
#### `WARNING: can't find /raglof/rag/res/borker.tm`
### `WARNING: can't find /raglof/rag/res/borker.tm`
Ignore this warning and continue. All system warnings can be ignored.
---
#### `network anomaly There is an abnormality in your network and you cannot connect to the server.`
### `network anomaly There is an abnormality in your network and you cannot connect to the server.`
![anomaly](https://github.com/infiniflow/ragflow/assets/93570324/beb7ad10-92e4-4a58-8886-bfb7cbd09e5d)
@ -226,11 +214,7 @@ You will not log in to RAGFlow unless the server is fully initialized. Run `dock
---
### Issues with RAGFlow backend services
---
#### `Realtime synonym is disabled, since no redis connection`
### `Realtime synonym is disabled, since no redis connection`
Ignore this warning and continue. All system warnings can be ignored.
@ -238,7 +222,7 @@ Ignore this warning and continue. All system warnings can be ignored.
---
#### Why does my document parsing stall at under one percent?
### Why does my document parsing stall at under one percent?
![stall](https://github.com/infiniflow/ragflow/assets/93570324/3589cc25-c733-47d5-bbfc-fedb74a3da50)
@ -255,7 +239,7 @@ Click the red cross beside the 'parsing status' bar, then restart the parsing pr
---
#### Why does my pdf parsing stall near completion, while the log does not show any error?
### Why does my pdf parsing stall near completion, while the log does not show any error?
Click the red cross beside the 'parsing status' bar, then restart the parsing process to see if the issue remains. If the issue persists and your RAGFlow is deployed locally, the parsing process is likely killed due to insufficient RAM. Try increasing your memory allocation by increasing the `MEM_LIMIT` value in **docker/.env**.
@ -276,13 +260,13 @@ docker compose up -d
---
#### `Index failure`
### `Index failure`
An index failure usually indicates an unavailable Elasticsearch service.
---
#### How to check the log of RAGFlow?
### How to check the log of RAGFlow?
```bash
tail -f ragflow/docker/ragflow-logs/*.log
@ -290,7 +274,7 @@ tail -f ragflow/docker/ragflow-logs/*.log
---
#### How to check the status of each component in RAGFlow?
### How to check the status of each component in RAGFlow?
1. Check the status of the Elasticsearch Docker container:
@ -315,7 +299,7 @@ The status of a Docker container status does not necessarily reflect the status
---
#### `Exception: Can't connect to ES cluster`
### `Exception: Can't connect to ES cluster`
1. Check the status of the Elasticsearch Docker container:
@ -339,19 +323,19 @@ The status of a Docker container status does not necessarily reflect the status
---
#### Can't start ES container and get `Elasticsearch did not exit normally`
### Can't start ES container and get `Elasticsearch did not exit normally`
This is because you forgot to update the `vm.max_map_count` value in **/etc/sysctl.conf** and your change to this value was reset after a system reboot.
---
#### `{"data":null,"code":100,"message":"<NotFound '404: Not Found'>"}`
### `{"data":null,"code":100,"message":"<NotFound '404: Not Found'>"}`
Your IP address or port number may be incorrect. If you are using the default configurations, enter `http://<IP_OF_YOUR_MACHINE>` (**NOT 9380, AND NO PORT NUMBER REQUIRED!**) in your browser. This should work.
---
#### `Ollama - Mistral instance running at 127.0.0.1:11434 but cannot add Ollama as model in RagFlow`
### `Ollama - Mistral instance running at 127.0.0.1:11434 but cannot add Ollama as model in RagFlow`
A correct Ollama IP address and port is crucial to adding models to Ollama:
@ -362,37 +346,13 @@ See [Deploy a local LLM](./guides/models/deploy_local_llm.mdx) for more informat
---
#### Do you offer examples of using DeepDoc to parse PDF or other files?
### Do you offer examples of using DeepDoc to parse PDF or other files?
Yes, we do. See the Python files under the **rag/app** folder.
---
#### Why did I fail to upload a 128MB+ file to my locally deployed RAGFlow?
Ensure that you update the **MAX_CONTENT_LENGTH** environment variable:
1. In **ragflow/docker/.env**, uncomment environment variable `MAX_CONTENT_LENGTH`:
```
MAX_CONTENT_LENGTH=176160768 # 168MB
```
2. Update **ragflow/docker/nginx/nginx.conf**:
```
client_max_body_size 168M
```
3. Restart the RAGFlow server:
```
docker compose up ragflow -d
```
---
#### `FileNotFoundError: [Errno 2] No such file or directory`
### `FileNotFoundError: [Errno 2] No such file or directory`
1. Check the status of the MinIO Docker container:
@ -418,14 +378,6 @@ The status of a Docker container status does not necessarily reflect the status
---
### How to increase the length of RAGFlow responses?
1. Right-click the desired dialog to display the **Chat Configuration** window.
2. Switch to the **Model Setting** tab and adjust the **Max Tokens** slider to get the desired length.
3. Click **OK** to confirm your change.
---
### How to run RAGFlow with a locally deployed LLM?
You can use Ollama or Xinference to deploy local LLM. See [here](./guides/models/deploy_local_llm.mdx) for more information.
@ -440,7 +392,7 @@ If your model is not currently supported but has APIs compatible with those of O
---
### How to interconnect RAGFlow with Ollama?
### How to integrate RAGFlow with Ollama?
- If RAGFlow is locally deployed, ensure that your RAGFlow and Ollama are in the same LAN.
- If you are using our online demo, ensure that the IP address of your Ollama server is public and accessible.
@ -486,3 +438,14 @@ See [Acquire a RAGFlow API key](./develop/acquire_ragflow_api_key.md).
See [Upgrade RAGFlow](./guides/upgrade_ragflow.mdx) for more information.
---
### How to switch the document engine to Infinity?
To switch your document engine from Elasticsearch to [Infinity](https://github.com/infiniflow/infinity):
1. In **docker/.env**, set `DOC_ENGINE=${DOC_ENGINE:-infinity}`
2. Restart your Docker image:
```bash
$ docker compose -f docker-compose.yml up -d
```

View File

@ -27,7 +27,7 @@ In contrast, chunks created from [knowledge graph construction](./construct_know
### Similarity threshold
This sets the bar for retrieving chunks: chunks with similarities below the threshold will be filtered out. By default, the threshold is set to 0.2. That means that only chunks with hybrid similarity score of 20 or higher will be retrieved.
This sets the bar for retrieving chunks: chunks with similarities below the threshold will be filtered out. By default, the threshold is set to 0.2. This means that only chunks with hybrid similarity score of 20 or higher will be retrieved.
### Keyword similarity weight

View File

@ -0,0 +1,39 @@
---
sidebar_position: 3
slug: /set_page_rank
---
# Set page rank
Create a step-retrieval strategy using page rank.
---
## Scenario
In an AI-powered chat, you can configure a chat assistant or an agent to respond using knowledge retrieved from multiple specified knowledge bases (datasets), provided that they employ the same embedding model. In situations where you prefer information from certain knowledge base(s) to take precedence or to be retrieved first, you can use RAGFlow's page rank feature to increase the ranking of chunks from these knowledge bases. For example, if you have configured a chat assistant to draw from two knowledge bases, knowledge base A for 2024 news and knowledge base B for 2023 news, but wish to prioritize news from year 2024, this feature is particularly useful.
:::info NOTE
It is important to note that this 'page rank' feature operates at the level of the entire knowledge base rather than on individual files or documents.
:::
## Configuration
On the **Configuration** page of your knowledge base, drag the slider under **Page rank** to set the page rank value for your knowledge base. You are also allowed to input the intended page rank value in the field next to the slider.
:::info NOTE
The page rank value must be an integer. Range: [0,100]
- 0: Disabled (Default)
- A specific value: enabled
:::
:::tip NOTE
If you set the page rank value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1.
:::
## Mechanism
If you configure a chat assistant's **similarity threshold** to 0.2, only chunks with a hybrid score greater than 0.2 x 100 = 20 will be retrieved and sent to the chat model for content generation. This initial filtering step is crucial for narrowing down relevant information.
If you have assigned a page rank of 1 to knowledge base A (2024 news) and 0 to knowledge base B (2023 news), the final hybrid scores of the retrieved chunks will be adjusted accordingly. A chunk from knowledge base A with an initial score of 50 will receive a boost of 1 x 100 = 100 points, resulting in a final score of 50 + 1 x 100 = 150. In this way, chunks retrieved from knowledge base A will always precede chunks from knowledge base B.