Feature/docs generator (#11858)

### Type of change - [x] New Feature (non-breaking change which adds functionality) ### What problem does this PR solve? This PR introduces a new Docs Generator agent component for producing downloadable PDF, DOCX, or TXT files from Markdown content generated within a RAGFlow workflow. ### **Key Features** **Backend** - New component: DocsGenerator (agent/component/docs_generator.py) - - Markdown → PDF/DOCX/TXT conversion - - Supports tables, lists, code blocks, headings, and rich formatting - - Configurable document style (fonts, margins, colors, page size, orientation) - - Optional header logo and footer with page numbers/timestamps - **Frontend** - New configuration UI for the Docs Generator - - Download button integrated into the chat interface - - Output wired to the Message component - - Full i18n support **Documentation** Added component guide: docs/guides/agent/agent_component_reference/docs_generator.md **Usage** Add the Docs Generator to a workflow, connect Markdown output from an upstream component, configure metadata/style, and feed its output into the Message component. Users will see a document download button directly in the chat. **Contributor Note** We have been following RAGFlow since more than a year and half now and have worked extensively on personalizing the framework and integrating it into several of our internal systems. Over the past year and a half, we have built multiple platforms that rely on RAGFlow as a core component, which has given us a strong appreciation for how flexible and powerful the project is. We also previously contributed the full Italian translation, and we were glad to see it accepted. This new Docs Generator component was created for our own production needs, and we believe that it may be useful for many others in the community as well. We want to sincerely thank the entire RAGFlow team for the remarkable work you have done and continue to do. If there are opportunities to contribute further, we would be glad to help whenever we have time available. It would be a pleasure to support the project in any way we can. If appropriate, we would be glad to be listed among the project’s contributors, but in any case we look forward to continuing to support and contribute to the project. PentaFrame Development Team --------- Co-authored-by: PentaFrame <info@pentaframe.it> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-02-08 03:25:06 +08:00 · 2025-12-12 07:59:43 +01:00
parent 6560388f2b
commit f9510edbbc
29 changed files with 3043 additions and 102 deletions
--- a/web/src/components/message-item/index.tsx
+++ b/web/src/components/message-item/index.tsx
@ -14,6 +14,11 @@ import { cn } from '@/lib/utils';
 import MarkdownContent from '../markdown-content';
 import { ReferenceDocumentList } from '../next-message-item/reference-document-list';
 import { UploadedMessageFiles } from '../next-message-item/uploaded-message-files';
+import {
+  PDFDownloadButton,
+  extractPDFDownloadInfo,
+  removePDFDownloadInfo,
+} from '../pdf-download-button';
 import { RAGFlowAvatar } from '../ragflow-avatar';
 import { useTheme } from '../theme-provider';
 import { AssistantGroupButton, UserGroupButton } from './group-button';
@ -61,6 +66,20 @@ const MessageItem = ({
    return reference?.doc_aggs ?? [];
  }, [reference?.doc_aggs]);

+  // Extract PDF download info from message content
+  const pdfDownloadInfo = useMemo(
+    () => extractPDFDownloadInfo(item.content),
+    [item.content],
+  );
+
+  // If we have PDF download info, extract the remaining text
+  const messageContent = useMemo(() => {
+    if (!pdfDownloadInfo) return item.content;
+
+    // Remove the JSON part from the content to avoid showing it
+    return removePDFDownloadInfo(item.content, pdfDownloadInfo);
+  }, [item.content, pdfDownloadInfo]);
+
  const handleRegenerateMessage = useCallback(() => {
    regenerateMessage?.(item);
  }, [regenerateMessage, item]);
@ -122,23 +141,34 @@ const MessageItem = ({
              ></UserGroupButton>
            )}

-            <div
-              className={cn(
-                isAssistant
-                  ? theme === 'dark'
-                    ? styles.messageTextDark
-                    : styles.messageText
-                  : styles.messageUserText,
-                { '!bg-bg-card': !isAssistant },
-              )}
-            >
-              <MarkdownContent
-                loading={loading}
-                content={item.content}
-                reference={reference}
-                clickDocumentButton={clickDocumentButton}
-              ></MarkdownContent>
-            </div>
+            {/* Show PDF download button if download info is present */}
+            {pdfDownloadInfo && (
+              <PDFDownloadButton
+                downloadInfo={pdfDownloadInfo}
+                className="mb-2"
+              />
+            )}
+
+            {/* Show message content if there's any text besides the download */}
+            {messageContent && (
+              <div
+                className={cn(
+                  isAssistant
+                    ? theme === 'dark'
+                      ? styles.messageTextDark
+                      : styles.messageText
+                    : styles.messageUserText,
+                  { '!bg-bg-card': !isAssistant },
+                )}
+              >
+                <MarkdownContent
+                  loading={loading}
+                  content={messageContent}
+                  reference={reference}
+                  clickDocumentButton={clickDocumentButton}
+                ></MarkdownContent>
+              </div>
+            )}
            {isAssistant && referenceDocumentList.length > 0 && (
              <ReferenceDocumentList
                list={referenceDocumentList}
--- a/web/src/components/next-message-item/index.tsx
+++ b/web/src/components/next-message-item/index.tsx
@ -4,6 +4,7 @@ import {
  IMessage,
  IReferenceChunk,
  IReferenceObject,
+  UploadResponseDataType,
 } from '@/interfaces/database/chat';
 import classNames from 'classnames';
 import {
@ -24,6 +25,11 @@ import { WorkFlowTimeline } from '@/pages/agent/log-sheet/workflow-timeline';
 import { isEmpty } from 'lodash';
 import { Atom, ChevronDown, ChevronUp } from 'lucide-react';
 import MarkdownContent from '../next-markdown-content';
+import {
+  PDFDownloadButton,
+  extractPDFDownloadInfo,
+  removePDFDownloadInfo,
+} from '../pdf-download-button';
 import { RAGFlowAvatar } from '../ragflow-avatar';
 import { useTheme } from '../theme-provider';
 import { Button } from '../ui/button';
@ -95,6 +101,20 @@ function MessageItem({
    return Object.values(docs);
  }, [reference?.doc_aggs]);

+  // Extract PDF download info from message content
+  const pdfDownloadInfo = useMemo(
+    () => extractPDFDownloadInfo(item.content),
+    [item.content],
+  );
+
+  // If we have PDF download info, extract the remaining text
+  const messageContent = useMemo(() => {
+    if (!pdfDownloadInfo) return item.content;
+
+    // Remove the JSON part from the content to avoid showing it
+    return removePDFDownloadInfo(item.content, pdfDownloadInfo);
+  }, [item.content, pdfDownloadInfo]);
+
  const handleRegenerateMessage = useCallback(() => {
    regenerateMessage?.(item);
  }, [regenerateMessage, item]);
@ -219,28 +239,39 @@ function MessageItem({
                  />
                </div>
              )}
-            <div
-              className={cn({
-                [theme === 'dark'
-                  ? styles.messageTextDark
-                  : styles.messageText]: isAssistant,
-                [styles.messageUserText]: !isAssistant,
-                'bg-bg-card': !isAssistant,
-              })}
-            >
-              {item.data ? (
-                children
-              ) : sendLoading && isEmpty(item.content) ? (
-                <>{!isShare && 'running...'}</>
-              ) : (
-                <MarkdownContent
-                  loading={loading}
-                  content={item.content}
-                  reference={reference}
-                  clickDocumentButton={clickDocumentButton}
-                ></MarkdownContent>
-              )}
-            </div>
+            {/* Show PDF download button if download info is present */}
+            {pdfDownloadInfo && (
+              <PDFDownloadButton
+                downloadInfo={pdfDownloadInfo}
+                className="mb-2"
+              />
+            )}
+
+            {/* Show message content if there's any text besides the download */}
+            {messageContent && (
+              <div
+                className={cn({
+                  [theme === 'dark'
+                    ? styles.messageTextDark
+                    : styles.messageText]: isAssistant,
+                  [styles.messageUserText]: !isAssistant,
+                  'bg-bg-card': !isAssistant,
+                })}
+              >
+                {item.data ? (
+                  children
+                ) : sendLoading && isEmpty(messageContent) ? (
+                  <>{!isShare && 'running...'}</>
+                ) : (
+                  <MarkdownContent
+                    loading={loading}
+                    content={messageContent}
+                    reference={reference}
+                    clickDocumentButton={clickDocumentButton}
+                  ></MarkdownContent>
+                )}
+              </div>
+            )}
            {isAssistant && referenceDocuments.length > 0 && (
              <ReferenceDocumentList
                list={referenceDocuments}
@ -248,7 +279,9 @@ function MessageItem({
            )}

            {isUser && (
-              <UploadedMessageFiles files={item.files}></UploadedMessageFiles>
+              <UploadedMessageFiles
+                files={item.files as File[] | UploadResponseDataType[]}
+              ></UploadedMessageFiles>
            )}
            {/* {isAssistant && item.attachment && item.attachment.doc_id && (
              <div className="w-full flex items-center justify-end">
--- a/web/src/components/pdf-download-button/index.tsx
+++ b/web/src/components/pdf-download-button/index.tsx
@ -0,0 +1,196 @@
+import { Button } from '@/components/ui/button';
+import { Download, FileText } from 'lucide-react';
+import { useCallback } from 'react';
+
+interface DocumentDownloadInfo {
+  filename: string;
+  base64: string;
+  mime_type: string;
+}
+
+interface DocumentDownloadButtonProps {
+  downloadInfo: DocumentDownloadInfo;
+  className?: string;
+}
+
+export function PDFDownloadButton({
+  downloadInfo,
+  className,
+}: DocumentDownloadButtonProps) {
+  const handleDownload = useCallback(() => {
+    try {
+      // Convert base64 to blob
+      const byteCharacters = atob(downloadInfo.base64);
+      const byteNumbers = new Array(byteCharacters.length);
+      for (let i = 0; i < byteCharacters.length; i++) {
+        byteNumbers[i] = byteCharacters.charCodeAt(i);
+      }
+      const byteArray = new Uint8Array(byteNumbers);
+      const blob = new Blob([byteArray], { type: downloadInfo.mime_type });
+
+      // Create download link
+      const url = window.URL.createObjectURL(blob);
+      const link = document.createElement('a');
+      link.href = url;
+      link.download = downloadInfo.filename;
+      document.body.appendChild(link);
+      link.click();
+
+      // Cleanup
+      document.body.removeChild(link);
+      window.URL.revokeObjectURL(url);
+    } catch (error) {
+      console.error('Error downloading document:', error);
+    }
+  }, [downloadInfo]);
+
+  // Determine document type from mime_type or filename
+  const getDocumentType = () => {
+    if (downloadInfo.mime_type === 'application/pdf') return 'PDF Document';
+    if (
+      downloadInfo.mime_type ===
+      'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
+    )
+      return 'Word Document';
+    if (downloadInfo.mime_type === 'text/plain') return 'Text Document';
+
+    // Fallback to file extension
+    const ext = downloadInfo.filename.split('.').pop()?.toUpperCase();
+    if (ext === 'PDF') return 'PDF Document';
+    if (ext === 'DOCX') return 'Word Document';
+    if (ext === 'TXT') return 'Text Document';
+
+    return 'Document';
+  };
+
+  return (
+    <div
+      className={`flex items-center gap-3 p-4 border rounded-lg bg-background-card ${className || ''}`}
+    >
+      <div className="flex-shrink-0">
+        <div className="p-2 bg-accent-primary/10 rounded-lg">
+          <FileText className="w-6 h-6 text-accent-primary" />
+        </div>
+      </div>
+      <div className="flex-1 min-w-0">
+        <div className="font-medium text-sm truncate">
+          {downloadInfo.filename}
+        </div>
+        <div className="text-xs text-muted-foreground">{getDocumentType()}</div>
+      </div>
+      <Button
+        onClick={handleDownload}
+        size="sm"
+        className="flex items-center gap-2"
+      >
+        <Download className="w-4 h-4" />
+        Download
+      </Button>
+    </div>
+  );
+}
+
+// Helper function to detect if content contains document download info
+export function extractPDFDownloadInfo(
+  content: string,
+): DocumentDownloadInfo | null {
+  try {
+    // Try to parse as JSON first (for pure JSON content)
+    const parsed = JSON.parse(content);
+    if (parsed && parsed.filename && parsed.base64 && parsed.mime_type) {
+      // Accept PDF, DOCX, and TXT formats
+      const validMimeTypes = [
+        'application/pdf',
+        'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
+        'text/plain',
+      ];
+      if (validMimeTypes.includes(parsed.mime_type)) {
+        return parsed as DocumentDownloadInfo;
+      }
+    }
+  } catch {
+    // If direct parsing fails, try to extract JSON object from mixed content
+    // Look for a JSON object that contains the required fields
+    // This regex finds a balanced JSON object by counting braces
+    const startPattern = /\{[^{}]*"filename"[^{}]*:/g;
+    let match;
+
+    while ((match = startPattern.exec(content)) !== null) {
+      const startIndex = match.index;
+      let braceCount = 0;
+      let endIndex = startIndex;
+
+      // Find the matching closing brace
+      for (let i = startIndex; i < content.length; i++) {
+        if (content[i] === '{') braceCount++;
+        if (content[i] === '}') braceCount--;
+
+        if (braceCount === 0) {
+          endIndex = i + 1;
+          break;
+        }
+      }
+
+      if (endIndex > startIndex) {
+        try {
+          const jsonStr = content.substring(startIndex, endIndex);
+          const parsed = JSON.parse(jsonStr);
+          if (parsed && parsed.filename && parsed.base64 && parsed.mime_type) {
+            // Accept PDF, DOCX, and TXT formats
+            const validMimeTypes = [
+              'application/pdf',
+              'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
+              'text/plain',
+            ];
+            if (validMimeTypes.includes(parsed.mime_type)) {
+              return parsed as DocumentDownloadInfo;
+            }
+          }
+        } catch {
+          // This wasn't valid JSON, continue searching
+        }
+      }
+    }
+  }
+  return null;
+}
+
+// Helper function to remove document download info from content
+export function removePDFDownloadInfo(
+  content: string,
+  downloadInfo: DocumentDownloadInfo,
+): string {
+  try {
+    // First, check if the entire content is just the JSON (most common case)
+    try {
+      const parsed = JSON.parse(content);
+      if (
+        parsed &&
+        parsed.filename === downloadInfo.filename &&
+        parsed.base64 === downloadInfo.base64
+      ) {
+        // The entire content is just the download JSON, return empty
+        return '';
+      }
+    } catch {
+      // Content is not pure JSON, continue with removal
+    }
+
+    // Try to remove the JSON string from content
+    const jsonStr = JSON.stringify(downloadInfo);
+    let cleaned = content.replace(jsonStr, '').trim();
+
+    // Also try with pretty-printed JSON (with indentation)
+    const prettyJsonStr = JSON.stringify(downloadInfo, null, 2);
+    cleaned = cleaned.replace(prettyJsonStr, '').trim();
+
+    // Also try to find and remove JSON object pattern from mixed content
+    // This handles cases where the JSON might have different formatting
+    const startPattern = /\{[^{}]*"filename"[^{}]*"base64"[^{}]*\}/g;
+    cleaned = cleaned.replace(startPattern, '').trim();
+
+    return cleaned;
+  } catch {
+    return content;
+  }
+}