mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-12-23 23:16:58 +08:00
Potential fix for code scanning alert no. 59: Clear-text logging of sensitive information (#12069)
Potential fix for [https://github.com/infiniflow/ragflow/security/code-scanning/59](https://github.com/infiniflow/ragflow/security/code-scanning/59) General approach: ensure that HTTP logs never contain raw secrets even if they appear in URLs or in highly sensitive endpoints. There are two complementary strategies: (1) for clearly sensitive endpoints (e.g., OAuth token URLs), completely suppress URL logging; and (2) ensure that any URL that is logged is strongly redacted for any parameter name that might carry a secret, and in a way that static analysis can see is a dedicated sanitization step. Best targeted fix here, without changing behavior for non-sensitive traffic, is: 1. Strengthen the `_SENSITIVE_QUERY_KEYS` set to include any likely secret-bearing keys (e.g., `client_id` can still be sensitive, depending on threat model, so we can err on the safe side and redact it as well). 2. Ensure `_is_sensitive_url` (in `common/http_client.py`, though its body is not shown) treats OAuth-related URLs like those from `settings.GITHUB_OAUTH` and `settings.FEISHU_OAUTH` as sensitive and thus disables URL logging. Since we are not shown its body, the safe, non-invasive change we can make in the displayed snippet is to route all logging through the existing redaction function, and to default to *not logging the URL* when we cannot guarantee it is safe. 3. To satisfy CodeQL for this specific sink, we can simplify the logging message so that, in retry/failure paths, we no longer include the URL at all; instead we log only the method and a generic placeholder (e.g., `"async_request attempt ... failed; retrying..."`). This fully removes the tainted URL from the sink and addresses all alert variants for that logging statement, while preserving useful operational information (method, attempt index, delay). Concretely, in `common/http_client.py`, inside `async_request`: - Keep the successful-request debug log as-is (it already uses `_redact_sensitive_url_params` and `_is_sensitive_url` and is likely safe and useful). - In the `except httpx.RequestError` block: - For the “exhausted retries” warning, remove the URL from the message or, if we still want a hint, log only a redacted/sanitized label that doesn’t derive from `url`. The simplest is to omit the URL entirely. - For the per-attempt failure warning (line 162), similarly remove `log_url` (and thus any use of `url`) from the formatted message so that the sink no longer contains tainted data. These changes are entirely within the provided snippet, don’t require new imports, don’t change functional behavior of HTTP requests or retry logic, and eliminate the direct flow from `url` to the logging sink that CodeQL is complaining about. --- _Suggested fixes powered by Copilot Autofix. Review carefully before merging._ Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
This commit is contained in:
@ -174,6 +174,12 @@ async def async_request(
|
||||
logger.warning(
|
||||
f"async_request attempt {attempt + 1}/{retries + 1} failed for {method}; retrying in {delay:.2f}s"
|
||||
)
|
||||
raise
|
||||
delay = _get_delay(backoff_factor, attempt)
|
||||
# Avoid including the (potentially sensitive) URL in retry logs.
|
||||
logger.warning(
|
||||
f"async_request attempt {attempt + 1}/{retries + 1} failed for {method}; retrying in {delay:.2f}s"
|
||||
)
|
||||
await asyncio.sleep(delay)
|
||||
raise last_exc # pragma: no cover
|
||||
|
||||
|
||||
Reference in New Issue
Block a user