ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-31 07:36:46 +08:00

Author	SHA1	Message	Date
Rainman	340354b79c	fix the error 'Unknown field for GenerationConfig: max_tokens' when u… (#8473 ) ### What problem does this PR solve? [https://github.com/infiniflow/ragflow/issues/8324](url) docker image version: v0.19.1 The `_clean_conf` function was not implemented in the `_chat` and `chat_streamly` methods of the `GeminiChat` class, causing the error "Unknown field for GenerationConfig: max_tokens" when the default LLM config includes the "max_tokens" parameter. Buggy Code(ragflow/rag/llm/chat_model.py) ```python class GeminiChat(Base): def __init__(self, key, model_name, base_url=None, kwargs): super().__init__(key, model_name, base_url=base_url, kwargs) from google.generativeai import GenerativeModel, client client.configure(api_key=key) _client = client.get_default_generative_client() self.model_name = "models/" + model_name self.model = GenerativeModel(model_name=self.model_name) self.model._client = _client def _clean_conf(self, gen_conf): for k in list(gen_conf.keys()): if k not in ["temperature", "top_p"]: del gen_conf[k] return gen_conf def _chat(self, history, gen_conf): from google.generativeai.types import content_types system = history[0]["content"] if history and history[0]["role"] == "system" else "" hist = [] for item in history: if item["role"] == "system": continue hist.append(deepcopy(item)) item = hist[-1] if "role" in item and item["role"] == "assistant": item["role"] = "model" if "role" in item and item["role"] == "system": item["role"] = "user" if "content" in item: item["parts"] = item.pop("content") if system: self.model._system_instruction = content_types.to_content(system) response = self.model.generate_content(hist, generation_config=gen_conf) ans = response.text return ans, response.usage_metadata.total_token_count def chat_streamly(self, system, history, gen_conf): from google.generativeai.types import content_types if system: self.model._system_instruction = content_types.to_content(system) #❌_clean_conf was not implemented for k in list(gen_conf.keys()): if k not in ["temperature", "top_p", "max_tokens"]: del gen_conf[k] for item in history: if "role" in item and item["role"] == "assistant": item["role"] = "model" if "content" in item: item["parts"] = item.pop("content") ans = "" try: response = self.model.generate_content(history, generation_config=gen_conf, stream=True) for resp in response: ans = resp.text yield ans yield response._chunks[-1].usage_metadata.total_token_count except Exception as e: yield ans + "\nERROR: " + str(e) yield 0 ``` Implement the _clean_conf function ```python class GeminiChat(Base): def __init__(self, key, model_name, base_url=None, kwargs): super().__init__(key, model_name, base_url=base_url, kwargs) from google.generativeai import GenerativeModel, client client.configure(api_key=key) _client = client.get_default_generative_client() self.model_name = "models/" + model_name self.model = GenerativeModel(model_name=self.model_name) self.model._client = _client def _clean_conf(self, gen_conf): for k in list(gen_conf.keys()): if k not in ["temperature", "top_p"]: del gen_conf[k] return gen_conf def _chat(self, history, gen_conf): from google.generativeai.types import content_types #✅ implement _clean_conf to remove the wrong parameters gen_conf = self._clean_conf(gen_conf) system = history[0]["content"] if history and history[0]["role"] == "system" else "" hist = [] for item in history: if item["role"] == "system": continue hist.append(deepcopy(item)) item = hist[-1] if "role" in item and item["role"] == "assistant": item["role"] = "model" if "role" in item and item["role"] == "system": item["role"] = "user" if "content" in item: item["parts"] = item.pop("content") if system: self.model._system_instruction = content_types.to_content(system) response = self.model.generate_content(hist, generation_config=gen_conf) ans = response.text return ans, response.usage_metadata.total_token_count def chat_streamly(self, system, history, gen_conf): from google.generativeai.types import content_types #✅ implement _clean_conf to remove the wrong parameters gen_conf = self._clean_conf(gen_conf) if system: self.model._system_instruction = content_types.to_content(system) #✅Removed duplicate parameter filtering logic "for k in list(gen_conf.keys()):" for item in history: if "role" in item and item["role"] == "assistant": item["role"] = "model" if "content" in item: item["parts"] = item.pop("content") ans = "" try: response = self.model.generate_content(history, generation_config=gen_conf, stream=True) for resp in response: ans = resp.text yield ans yield response._chunks[-1].usage_metadata.total_token_count except Exception as e: yield ans + "\nERROR: " + str(e) yield 0 ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-25 16:23:35 +08:00
Rainman	49d67cbcb7	fix a bug when using huggingface embedding api (#8432 ) ### What problem does this PR solve? image_version: v0.19.1 This PR fixes a bug in the HuggingFaceEmBedding API method that was causing AssertionError: assert len(vects) == len(docs) during the document embedding process. #### Problem The HuggingFaceEmbed.encode() method had an early return statement inside the for loop, causing it to return after processing only the first text input instead of processing all texts in the input list. Error Messenge ```python AssertionError: assert len(vects) == len(docs) # input chunks != embedded vectors from embedding api File "/ragflow/rag/svr/task_executor.py", line 442, in embedding ``` Buggy code(/ragflow/rag/llm/embedding_model.py) ```python class HuggingFaceEmbed(Base): def __init__(self, key, model_name, base_url=None): if not model_name: raise ValueError("Model name cannot be None") self.key = key self.model_name = model_name.split("___")[0] self.base_url = base_url or "http://127.0.0.1:8080" def encode(self, texts: list): embeddings = [] for text in texts: response = requests.post(...) if response.status_code == 200: try: embedding = response.json() embeddings.append(embedding[0]) # ❌ Early return return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts]) except Exception as _e: log_exception(_e, response) else: raise Exception(...) ``` Fixed Code(I just Rollback this function to the v0.19.0 version) ```python Class HuggingFaceEmbed(Base): def __init__(self, key, model_name, base_url=None): if not model_name: raise ValueError("Model name cannot be None") self.key = key self.model_name = model_name.split("___")[0] self.base_url = base_url or "http://127.0.0.1:8080" def encode(self, texts: list): embeddings = [] for text in texts: response = requests.post(...) if response.status_code == 200: embedding = response.json() embeddings.append(embedding[0]) # ✅ Only append, no return else: raise Exception(...) return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts]) # ✅ Return after processing all ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-24 09:35:02 +08:00
Song Fuchang	fd7ac17605	Feat: Scratch MCP tool calling support. (#8263 ) ### What problem does this PR solve? This is a cherry-pick from #7781 as requested. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-23 17:45:35 +08:00
Liu An	244d8a47b9	Fix: AzureChat model code (#8426 ) ### What problem does this PR solve? - Simplify AzureChat constructor by passing base_url directly - Clean up spacing and formatting in chat_model.py - Remove redundant parentheses and improve code consistency - #8423 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-23 15:59:25 +08:00
Stephen Hu	ef5e7d8c44	Fix:embedding_model class SILICONFLOWEmbed(Base)Function reusing json (#8378 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8360 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-20 11:13:00 +08:00
Stephen Hu	35034fed73	Fix: Raptor: [Bug]: ERROR: Unknown field for GenerationConfig: max_tokens (#8331 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8324 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-18 16:40:57 +08:00
Kevin Hu	b1117a8717	Fix: base url issue. (#8281 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-16 13:40:25 +08:00
Kevin Hu	65d5268439	Feat: implement novitaAI embedding and reranking. (#8250 ) ### What problem does this PR solve? Close #8227 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-13 15:42:17 +08:00
Kevin Hu	d36c8d18b1	Refa: make exception more clear. (#8224 ) ### What problem does this PR solve? #8156 ### Type of change - [x] Refactoring	2025-06-12 17:53:59 +08:00
Kevin Hu	d5236b71f4	Refa: ollama keep alive issue. (#8216 ) ### What problem does this PR solve? #8122 ### Type of change - [x] Refactoring	2025-06-12 15:09:40 +08:00
Kevin Hu	56ee69e9d9	Refa: chat with tools. (#8210 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-06-12 12:31:10 +08:00
Yongteng Lei	1a5f991d86	Fix: auto-keyword and auto-question fail with qwq model (#8190 ) ### What problem does this PR solve? Fix auto-keyword and auto-question fail with qwq model. #8189 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 11:37:07 +08:00
Kevin Hu	69e1fc496d	Refa: chat models (#8187 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-06-11 17:20:12 +08:00
Liu An	a43adafc6b	Refa: Add error handling for JSON decode in embedding models (#8162 ) ### What problem does this PR solve? Improve robustness of Jina, Nvidia, and SILICONFLOW embedding models by: 1. Adding try-catch blocks for JSON decode errors 2. Logging error details including response content 3. Raising exceptions with meaningful error messages ### Type of change - [x] Refactoring	2025-06-10 19:04:17 +08:00
Kevin Hu	7ed9efcd4e	Fix: QWenCV issue. (#8106 ) ### What problem does this PR solve? Close #8097 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-06 17:55:13 +08:00
Kevin Hu	156290f8d0	Fix: url path join issue. (#8013 ) ### What problem does this PR solve? Close #7980 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 14:18:40 +08:00
Song Fuchang	a1f06a4fdc	Feat: Support tool calling in Generate component (#7572 ) ### What problem does this PR solve? Hello, our use case requires LLM agent to invoke some tools, so I made a simple implementation here. This PR does two things: 1. A simple plugin mechanism based on `pluginlib`: This mechanism lives in the `plugin` directory. It will only load plugins from `plugin/embedded_plugins` for now. A sample plugin `bad_calculator.py` is placed in `plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`, then give a wrong result `a + b + 100`. In the future, it can load plugins from external location with little code change. Plugins are divided into different types. The only plugin type supported in this PR is `llm_tools`, which must implement the `LLMToolPlugin` class in the `plugin/llm_tool_plugin.py`. More plugin types can be added in the future. 2. A tool selector in the `Generate` component: Added a tool selector to select one or more tools for LLM: ![image](https://github.com/user-attachments/assets/74a21fdf-9333-4175-991b-43df6524c5dc) And with the `bad_calculator` tool, it results this with the `qwen-max` model: ![image](https://github.com/user-attachments/assets/93aff9c4-8550-414a-90a2-1a15a5249d94) ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2025-05-16 16:32:19 +08:00
Kevin Hu	5b626870d0	Refa: remove ollama keep alive. (#7560 ) ### What problem does this PR solve? #7518 ### Type of change - [x] Refactoring	2025-05-09 17:51:49 +08:00
Stephen Hu	65537b8200	Fix:Set CUDA_VISIBLE_DEVICES In DefaultEmbedding (#7465 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7420 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-06 14:38:36 +08:00
Neal Davis	23dcbc94ef	feat: replace models of novita (#7360 ) ### What problem does this PR solve? Replace models of novita ### Type of change - [x] Other (please describe): Replace models of novita	2025-04-28 13:35:09 +08:00
Yongteng Lei	97a13ef1ab	Fix: Qwen-vl-plus url error (#7281 ) ### What problem does this PR solve? Fix Qwen-vl-* url error. #7277 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-25 09:20:10 +08:00
Yongteng Lei	a008b38cf5	Fix: local variable referenced before assignment (#6909 ) ### What problem does this PR solve? Fix: local variable referenced before assignment. #6803 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-09 20:29:12 +08:00
Yongteng Lei	dc2c74b249	Feat: add primitive support for function calls (#6840 ) ### What problem does this PR solve? This PR introduces primitive support for function calls, enabling the system to handle basic function call capabilities. However, this feature is currently experimental and not yet enabled for general use, as it is only supported by a subset of models, namely, Qwen and OpenAI models. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-08 16:09:03 +08:00
Zhichang Yu	e7a2a4b7ff	Log llm response on exception (#6750 ) ### What problem does this PR solve? Log llm response on exception ### Type of change - [x] Refactoring	2025-04-02 17:10:57 +08:00
Alex Chen	46b5e32cd7	Feat: support vision llm for gpustack (#6636 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/6138 This PR is going to support vision llm for gpustack, modify url path from `/v1-openai` to `/v1` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-31 15:33:52 +08:00
Marcus Yuan	c61df5dd25	Dynamic Context Window Size for Ollama Chat (#6582 ) # Dynamic Context Window Size for Ollama Chat ## Problem Statement Previously, the Ollama chat implementation used a fixed context window size of 32768 tokens. This caused two main issues: 1. Performance degradation due to unnecessarily large context windows for small conversations 2. Potential business logic failures when using smaller fixed sizes (e.g., 2048 tokens) ## Solution Implemented a dynamic context window size calculation that: 1. Uses a base context size of 8192 tokens 2. Applies a 1.2x buffer ratio to the total token count 3. Adds multiples of 8192 tokens based on the buffered token count 4. Implements a smart context size update strategy ## Implementation Details ### Token Counting Logic ```python def count_tokens(text): """Calculate token count for text""" # Simple calculation: 1 token per ASCII character # 2 tokens for non-ASCII characters (Chinese, Japanese, Korean, etc.) total = 0 for char in text: if ord(char) < 128: # ASCII characters total += 1 else: # Non-ASCII characters total += 2 return total ``` ### Dynamic Context Calculation ```python def _calculate_dynamic_ctx(self, history): """Calculate dynamic context window size""" # Calculate total tokens for all messages total_tokens = 0 for message in history: content = message.get("content", "") content_tokens = count_tokens(content) role_tokens = 4 # Role marker token overhead total_tokens += content_tokens + role_tokens # Apply 1.2x buffer ratio total_tokens_with_buffer = int(total_tokens * 1.2) # Calculate context size in multiples of 8192 if total_tokens_with_buffer <= 8192: ctx_size = 8192 else: ctx_multiplier = (total_tokens_with_buffer // 8192) + 1 ctx_size = ctx_multiplier * 8192 return ctx_size ``` ### Integration in Chat Method ```python def chat(self, system, history, gen_conf): if system: history.insert(0, {"role": "system", "content": system}) if "max_tokens" in gen_conf: del gen_conf["max_tokens"] try: # Calculate new context size new_ctx_size = self._calculate_dynamic_ctx(history) # Prepare options with context size options = { "num_ctx": new_ctx_size } # Add other generation options if "temperature" in gen_conf: options["temperature"] = gen_conf["temperature"] if "max_tokens" in gen_conf: options["num_predict"] = gen_conf["max_tokens"] if "top_p" in gen_conf: options["top_p"] = gen_conf["top_p"] if "presence_penalty" in gen_conf: options["presence_penalty"] = gen_conf["presence_penalty"] if "frequency_penalty" in gen_conf: options["frequency_penalty"] = gen_conf["frequency_penalty"] # Make API call with dynamic context size response = self.client.chat( model=self.model_name, messages=history, options=options, keep_alive=60 ) return response["message"]["content"].strip(), response.get("eval_count", 0) + response.get("prompt_eval_count", 0) except Exception as e: return "ERROR: " + str(e), 0 ``` ## Benefits 1. Improved Performance: Uses appropriate context windows based on conversation length 2. Better Resource Utilization: Context window size scales with content 3. Maintained Compatibility: Works with existing business logic 4. Predictable Scaling: Context growth in 8192-token increments 5. Smart Updates: Context size updates are optimized to reduce unnecessary model reloads ## Future Considerations 1. Fine-tune buffer ratio based on usage patterns 2. Add monitoring for context window utilization 3. Consider language-specific token counting optimizations 4. Implement adaptive threshold based on conversation patterns 5. Add metrics for context size update frequency --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-03-28 12:38:27 +08:00
Kevin Hu	d2043ff9f2	Fix: LmStudioChat issue. (#6591 ) ### What problem does this PR solve? #6577 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-27 14:59:15 +08:00
Yongteng Lei	df3890827d	Refa: change LLM chat output from full to delta (incremental) (#6534 ) ### What problem does this PR solve? Change LLM chat output from full to delta (incremental) ### Type of change - [x] Refactoring	2025-03-26 19:33:14 +08:00
Kevin Hu	cc8029a732	Fix: uploading in chat box issue. (#6547 ) ### What problem does this PR solve? #6228 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-26 15:37:48 +08:00
Kevin Hu	12ad746ee6	Fix: Bedrock model invocation error. (#6533 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-26 11:27:12 +08:00
Kevin Hu	60c3a253ad	Fix: api-key issue for xinference. (#6490 ) ### What problem does this PR solve? #2792 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-25 15:01:13 +08:00
Kevin Hu	095fc84cf2	Fix: claude max tokens. (#6484 ) ### What problem does this PR solve? #6458 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-25 10:41:55 +08:00
Kevin Hu	b77ce4e846	Feat: support api-key for Ollama. (#6448 ) ### What problem does this PR solve? #6189 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-24 14:53:17 +08:00
Kevin Hu	85eb3775d6	Refa: update Anthropic models. (#6445 ) ### What problem does this PR solve? #6421 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-24 12:34:57 +08:00
zhou	a6aed0da46	Fix: rerank with YoudaoRerank issue. (#6396 ) ### What problem does this PR solve? Fix rerank with YoudaoRerank issue，"'YoudaoRerank' object has no attribute '_dynamic_batch_size'" ![17425412353825](https://github.com/user-attachments/assets/9ed304c7-317a-440e-acff-fe895fc20f07) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-24 10:09:16 +08:00
fansir	efc4796f01	Fix ratelimit errors during document parsing (#6413 ) ### What problem does this PR solve? When using the online large model API knowledge base to extract knowledge graphs, frequent Rate Limit Errors were triggered, causing document parsing to fail. This commit fixes the issue by optimizing API calls in the following way: Added exponential backoff and jitter to the API call to reduce the frequency of Rate Limit Errors. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-03-22 23:07:03 +08:00
Kevin Hu	a2a4bfe3e3	Fix: change ollama default num_ctx. (#6395 ) ### What problem does this PR solve? #6163 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-21 16:22:03 +08:00
zhou	85480f6292	Fix: the error of Ollama embeddings interface returning "500 Internal Server Error" (#6350 ) ### What problem does this PR solve? Fix the error where the Ollama embeddings interface returns a “500 Internal Server Error” when using models such as xiaobu-embedding-v2 for embedding. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-21 15:25:48 +08:00
Kevin Hu	d83911b632	Fix: huggingface rerank model issue. (#6385 ) ### What problem does this PR solve? #6348 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-21 12:43:32 +08:00
Kevin Hu	5b04b7d972	Fix: rerank with vllm issue. (#6306 ) ### What problem does this PR solve? #6301 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-20 11:52:42 +08:00
Kevin Hu	c6e1a2ca8a	Feat: add TTS support for SILICONFLOW. (#6264 ) ### What problem does this PR solve? #6244 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-19 12:52:12 +08:00
Yongteng Lei	5cf610af40	Feat: add vision LLM PDF parser (#6173 ) ### What problem does this PR solve? Add vision LLM PDF parser ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-03-18 14:52:20 +08:00
Kevin Hu	e9a6675c40	Fix: enable ollama api-key. (#6205 ) ### What problem does this PR solve? #6189 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-18 13:37:34 +08:00
Kevin Hu	7e4d693054	Fix: in case response.choices[0].message.content is None. (#6190 ) ### What problem does this PR solve? #6164 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-18 10:00:27 +08:00
Kevin Hu	56b228f187	Refa: remove max toekns for image2txt models. (#6078 ) ### What problem does this PR solve? #6063 ### Type of change - [x] Refactoring	2025-03-14 13:51:45 +08:00
writinwaters	9c8060f619	0.17.1 release notes (#6021 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-03-13 14:43:24 +08:00
Kevin Hu	3571270191	Refa: refine the context window size warning. (#5993 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-03-12 19:40:54 +08:00
kuro5989	6e13922bdc	Feat: Add qwq model support to Tongyi-Qianwen factory (#5981 ) ### What problem does this PR solve? add qwq model support to Tongyi-Qianwen factory https://github.com/infiniflow/ragflow/issues/5869 ### Type of change - [x] New Feature (non-breaking change which adds functionality) ![image](https://github.com/user-attachments/assets/49f5c6a0-ecaf-41dd-a23a-2009f854d62c) ![image](https://github.com/user-attachments/assets/93ffa303-920e-4942-8188-bcd6b7209204) ![1741774779438](https://github.com/user-attachments/assets/25f2fd1d-8640-4df0-9a08-78ee9daaa8fe) ![image](https://github.com/user-attachments/assets/4763cf6c-1f76-43c4-80ee-74dfd666a184) Co-authored-by: zhaozhicheng <zhicheng.zhao@fastonetech.com>	2025-03-12 18:54:15 +08:00
Edouard Hur	b29539b442	Fix: CoHereRerank not respecting base_url when provided (#5784 ) ### What problem does this PR solve? vLLM provider with a reranking model does not work : as vLLM uses under the hood the [CoHereRerank provider](https://github.com/infiniflow/ragflow/blob/v0.17.0/rag/llm/__init__.py#L250) with a `base_url`, if this URL [is not passed to the Cohere client](https://github.com/infiniflow/ragflow/blob/v0.17.0/rag/llm/rerank_model.py#L379-L382) any attempt will endup on the Cohere SaaS (sending your private api key in the process) instead of your vLLM instance. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-03-10 11:22:06 +08:00
Kevin Hu	df9b7b2fe9	Fix: rerank issue. (#5696 ) ### What problem does this PR solve? #5673 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-06 15:05:19 +08:00

1 2 3 4 5 ...

296 Commits