From 6c9afd1ffb7f8fb0e13f65d87465f26303fdde9a Mon Sep 17 00:00:00 2001 From: Yingfeng Date: Mon, 22 Dec 2025 13:31:39 +0800 Subject: [PATCH] Potential fix for code scanning alert no. 60: Clear-text logging of sensitive information (#12068) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Potential fix for [https://github.com/infiniflow/ragflow/security/code-scanning/60](https://github.com/infiniflow/ragflow/security/code-scanning/60) In general, the correct fix is to ensure that no sensitive data (passwords, API keys, full connection strings with embedded credentials, etc.) is ever written to logs. This can be done by (1) whitelisting only clearly non-sensitive fields for logging, and/or (2) explicitly scrubbing or masking any value that might contain credentials before logging, and (3) not relying on later deletion from the dictionary to protect against logging, since the log call already happened. For this function, the best minimal fix is: - Keep the idea of a safe key whitelist, but strengthen it so we are absolutely sure we never log `password` or `connection_string`, even indirectly. - Avoid building the logged dict from the same potentially-tainted `kwargs` object before we have removed sensitive keys, or relying solely on key names that might change. - Construct a separate, small log context that is obviously safe: scheme, host, port, database, table, and possibly a boolean like `has_password` instead of the password itself. - Optionally, add a small helper to derive this safe log context, but given the scope we can keep it inline. Concretely in `rag/utils/opendal_conn.py`: - Replace the current `SAFE_LOG_KEYS` / `loggable_kwargs` / `logging.info(...)` block so that: - We do not pass through arbitrary `kwargs` values by key filtering alone. - We instead build a new dict with explicitly chosen, non-sensitive fields, e.g.: ```python safe_log_info = { "scheme": kwargs.get("scheme"), "host": kwargs.get("host"), "port": kwargs.get("port"), "database": kwargs.get("database"), "table": kwargs.get("table"), "has_password": "password" in kwargs or "connection_string" in kwargs, } logging.info("Loaded OpenDAL configuration (non sensitive fields only): %s", safe_log_info) ``` - This makes sure that neither the password nor a connection string containing it is ever logged, while still retaining useful diagnostic information. - Keep the existing deletion of `password` and `connection_string` from `kwargs` after logging, as an additional safety measure for any later use of `kwargs`. No new imports or external libraries are required; we only modify lines 45–56 of the shown snippet. _Suggested fixes powered by Copilot Autofix. Review carefully before merging._ Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --- rag/utils/opendal_conn.py | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/rag/utils/opendal_conn.py b/rag/utils/opendal_conn.py index c261665f7..b188346c7 100644 --- a/rag/utils/opendal_conn.py +++ b/rag/utils/opendal_conn.py @@ -45,9 +45,16 @@ def get_opendal_config(): # Only include non-sensitive keys in logs. Do NOT # add 'password' or any key containing embedded credentials # (like 'connection_string'). - SAFE_LOG_KEYS = ['scheme', 'host', 'port', 'database', 'table'] # explicitly non-sensitive - loggable_kwargs = {k: v for k, v in kwargs.items() if k in SAFE_LOG_KEYS} - logging.info("Loaded OpenDAL configuration (non sensitive fields only): %s", loggable_kwargs) + safe_log_info = { + "scheme": kwargs.get("scheme"), + "host": kwargs.get("host"), + "port": kwargs.get("port"), + "database": kwargs.get("database"), + "table": kwargs.get("table"), + # indicate presence of credentials without logging them + "has_credentials": any(k in kwargs for k in ("password", "connection_string")), + } + logging.info("Loaded OpenDAL configuration (non sensitive fields only): %s", safe_log_info) # For safety, explicitly remove sensitive keys from kwargs after use if "password" in kwargs: