Use 'float' explicitly for OpenAI's embedding "encoding_format" (#9838 )

### What problem does this PR solve? The default value for OpenAI '/v1/embeddings' parameter 'encoding_format' is 'base64'. Use 'float' explicitly to avoid base64 encoding & decoding, larger data size. https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py if not is_given(encoding_format): params["encoding_format"] = "base64" ### Type of change - [x] Performance Improvement
Refactor: Improve the buffer close for vision_llm_chunk (#9845 )
2026-01-04 03:25:30 +08:00 · 2025-09-02 10:31:51 +08:00 · 2025-09-02 10:31:37 +08:00 · 2025-09-02 10:28:23 +08:00
4 changed files with 82 additions and 19 deletions
--- a/docs/guides/agent/agent_component_reference/retrieval.mdx
+++ b/docs/guides/agent/agent_component_reference/retrieval.mdx
@ -9,19 +9,70 @@ A component that retrieves information from specified datasets.

 ## Scenarios

-A **Retrieval** component is essential in most RAG scenarios, where information is extracted from designated knowledge bases before being sent to the LLM for content generation. As of v0.20.4, a **Retrieval** component can operate either as a workflow component or as a tool of an **Agent**, enabling the Agent to control its invocation and search queries.
+A **Retrieval** component is essential in most RAG scenarios, where information is extracted from designated knowledge bases before being sent to the LLM for content generation. A **Retrieval** component can operate either as a standalone workflow module or as a tool for an **Agent** component. In the latter role, the **Agent** component has autonomous control over when to invoke it for query and retrieval.
+
+The following screenshot shows a reference design using the **Retrieval** component, where the component serves as a tool for an **Agent** component. You can find it from the **Report Agent Using Knowledge Base** Agent template.
+
+![retrieval_reference_design](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/retrieval_reference_design.jpg)
+
+## Prerequisites
+
+Ensure you [have properly configured your target knowledge base(s)](../../dataset/configure_knowledge_base.md).
+
+## Quickstart
+
+### 1. Click on a **Retrieval** component to show its configuration panel  
+
+The corresponding configuration panel appears to the right of the canvas. Use this panel to define and fine-tune the **Retrieval** component's search behavior.
+
+### 2. Input query variable(s)
+
+The **Retrieval** component relies on query variables to specify its queries. 
+
+:::caution IMPORTANT
+- If you use the **Retrieval** component as a standalone workflow module, input query variables in the **Input Variables** text box.
+- If it is used as a tool for an **Agent** component, input the query variables in the **Agent** component's **User prompt** field.
+:::
+
+By default, you can use `sys.query`, which is the user query and the default output of the **Begin** component. All global variables defined before the **Retrieval** component can also be used as query statements. Use the `(x)` button or type `/` to show all the available query variables.
+
+### 3. Select knowledge base(s) to query
+
+You can specify one or multiple knowledge bases to retrieve data from. If selecting mutiple, ensure they use the same embedding model.
+
+### 4. Expand **Advanced Settings** to configure the retrieval method
+
+By default, a combination of weighted keyword similarity and weighted vector cosine similarity is used during retrieval. If a rerank model is selected, a combination of weighted keyword similarity and weighted reranking score will be used for retrieval.
+
+As a starter, you can skip this step to stay with the default retrieval method.
+
+:::caution WARNING
+Using a rerank model will *significantly* increase the system's response time. If you must use a rerank model, ensure you use a SaaS reranker; if you prefer a locally deployed rerank model, ensure you start RAGFlow with **docker-compose-gpu.yml**.
+:::
+
+### 5. Enable cross-language search
+
+If your user query is different from the languages of the knowledge bases, you can select the target languages in the **Cross-language search** dropdown menu. The model will then translates queries to ensure accurate matching of semantic meaning across languages.
+
+
+### 6. Test retrieval results
+
+Click the triangle button on the top of canvas to test the retrieval results.
+
+### 6. Choose the next component
+
+When necessary, click the **+** button on the **Retrieval** component to choose the next component in the worflow from the dropdown list.
+

 ## Configurations

-Click on a **Retrieval** component to open its configuration window.
-
 ### Query variables

 *Mandatory*

-Select the query source for retrieval.
+Select the query source for retrieval. Defaults to `sys.query`, which is the default output of the **Begin** component.

-The **Retrieval** component relies on query variables to specify its data inputs (queries). All global variables defined before the **Retrieval** component are available in the dropdown list.  
+The **Retrieval** component relies on query variables to specify its queries. All global variables defined before the **Retrieval** component can also be used as queries. Use the `(x)` button or type `/` to show all the available query variables.

 ### Knowledge bases 

@ -72,8 +123,23 @@ Select one or more languages for cross‑language search. If no language is sele

 ### Use knowledge graph

+:::caution IMPORTANT
+Before enabling this feature, ensure you have properly [constructed a knowledge graph from each target knowledge base](../../dataset/construct_knowledge_graph.md).
+:::
+
 Whether to use knowledge graph(s) in the specified knowledge base(s) during retrieval for multi-hop question answering. When enabled, this would involve iterative searches across entity, relationship, and community report chunks, greatly increasing retrieval time.

 ### Output

 The global variable name for the output of the **Retrieval** component, which can be referenced by other components in the workflow.
+
+
+## Frequently asked questions
+
+### How to reduce response time?
+
+Go through the checklist below for best performance:
+
+- Leave the **Rerank model** field empty.
+- If you must use a rerank model, ensure you use a SaaS reranker; if you prefer a locally deployed rerank model, ensure you start RAGFlow with **docker-compose-gpu.yml**.
+- Disable **Use knowledge graph**.
--- a/docs/quickstart.mdx
+++ b/docs/quickstart.mdx
@ -267,7 +267,7 @@ RAGFlow also supports deploying LLMs locally using Ollama, Xinference, or LocalA

 To add and configure an LLM: 

-1. Click on your logo on the top right of the page **>** **Model providers**。
+1. Click on your logo on the top right of the page **>** **Model providers**.

 2. Click on the desired LLM and update the API key accordingly.

@ -286,7 +286,7 @@ You are allowed to upload files to a knowledge base in RAGFlow and parse them in

 To create your first knowledge base:

-1. Click the **Knowledge Base** tab in the top middle of the page **>** **Create knowledge base**.
+1. Click the **Dataset** tab in the top middle of the page **>** **Create dataset**.

 2. Input the name of your knowledge base and click **OK** to confirm your changes.

@ -330,7 +330,7 @@ RAGFlow features visibility and explainability, allowing you to view the chunkin
   ![update chunk](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/add_keyword_question.jpg)

 :::caution NOTE
-You can add keywords to a file chunk to improve its ranking for queries containing those keywords. This action increases its keyword weight and can improve its position in search list.
+You can add keywords or questions to a file chunk to improve its ranking for queries containing those keywords. This action increases its keyword weight and can improve its position in search list.
 :::

 4. In Retrieval testing, ask a quick question in **Test text** to double check if your configurations work:
--- a/rag/app/picture.py
+++ b/rag/app/picture.py
@ -78,15 +78,12 @@ def vision_llm_chunk(binary, vision_model, prompt=None, callback=None):
    txt = ""

    try:
-        img_binary = io.BytesIO()
-        img.save(img_binary, format='JPEG')
-        img_binary.seek(0)
-
-        ans = clean_markdown_block(vision_model.describe_with_prompt(img_binary.read(), prompt))
-
-        txt += "\n" + ans
-
-        return txt
+        with io.BytesIO() as img_binary:
+            img.save(img_binary, format='JPEG')
+            img_binary.seek(0)
+            ans = clean_markdown_block(vision_model.describe_with_prompt(img_binary.read(), prompt))
+            txt += "\n" + ans
+            return txt

    except Exception as e:
        callback(-1, str(e))
--- a/rag/llm/embedding_model.py
+++ b/rag/llm/embedding_model.py
@ -145,7 +145,7 @@ class OpenAIEmbed(Base):
        ress = []
        total_tokens = 0
        for i in range(0, len(texts), batch_size):
-            res = self.client.embeddings.create(input=texts[i : i + batch_size], model=self.model_name)
+            res = self.client.embeddings.create(input=texts[i : i + batch_size], model=self.model_name, encoding_format="float")
            try:
                ress.extend([d.embedding for d in res.data])
                total_tokens += self.total_token_count(res)
@ -154,7 +154,7 @@ class OpenAIEmbed(Base):
        return np.array(ress), total_tokens

    def encode_queries(self, text):
-        res = self.client.embeddings.create(input=[truncate(text, 8191)], model=self.model_name)
+        res = self.client.embeddings.create(input=[truncate(text, 8191)], model=self.model_name, encoding_format="float")
        return np.array(res.data[0].embedding), self.total_token_count(res)