Feature/docs generator (#11858)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


### What problem does this PR solve?

This PR introduces a new Docs Generator agent component for producing
downloadable PDF, DOCX, or TXT files from Markdown content generated
within a RAGFlow workflow.

### **Key Features**

**Backend**

- New component: DocsGenerator (agent/component/docs_generator.py)
- 
- Markdown → PDF/DOCX/TXT conversion
- 
- Supports tables, lists, code blocks, headings, and rich formatting
- 
- Configurable document style (fonts, margins, colors, page size,
orientation)
- 
- Optional header logo and footer with page numbers/timestamps
- 

**Frontend**

- New configuration UI for the Docs Generator
- 
- Download button integrated into the chat interface
- 
- Output wired to the Message component
- 
- Full i18n support

**Documentation**

Added component guide:
docs/guides/agent/agent_component_reference/docs_generator.md

**Usage**

Add the Docs Generator to a workflow, connect Markdown output from an
upstream component, configure metadata/style, and feed its output into
the Message component. Users will see a document download button
directly in the chat.

**Contributor Note**

We have been following RAGFlow since more than a year and half now and
have worked extensively on personalizing the framework and integrating
it into several of our internal systems. Over the past year and a half,
we have built multiple platforms that rely on RAGFlow as a core
component, which has given us a strong appreciation for how flexible and
powerful the project is.

We also previously contributed the full Italian translation, and we were
glad to see it accepted. This new Docs Generator component was created
for our own production needs, and we believe that it may be useful for
many others in the community as well.

We want to sincerely thank the entire RAGFlow team for the remarkable
work you have done and continue to do. If there are opportunities to
contribute further, we would be glad to help whenever we have time
available. It would be a pleasure to support the project in any way we
can.

If appropriate, we would be glad to be listed among the project’s
contributors, but in any case we look forward to continuing to support
and contribute to the project.

PentaFrame Development Team

---------

Co-authored-by: PentaFrame <info@pentaframe.it>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
This commit is contained in:
PentaFDevs
2025-12-12 07:59:43 +01:00
committed by GitHub
parent 6560388f2b
commit f9510edbbc
29 changed files with 3043 additions and 102 deletions

View File

@ -14,6 +14,11 @@ import { cn } from '@/lib/utils';
import MarkdownContent from '../markdown-content';
import { ReferenceDocumentList } from '../next-message-item/reference-document-list';
import { UploadedMessageFiles } from '../next-message-item/uploaded-message-files';
import {
PDFDownloadButton,
extractPDFDownloadInfo,
removePDFDownloadInfo,
} from '../pdf-download-button';
import { RAGFlowAvatar } from '../ragflow-avatar';
import { useTheme } from '../theme-provider';
import { AssistantGroupButton, UserGroupButton } from './group-button';
@ -61,6 +66,20 @@ const MessageItem = ({
return reference?.doc_aggs ?? [];
}, [reference?.doc_aggs]);
// Extract PDF download info from message content
const pdfDownloadInfo = useMemo(
() => extractPDFDownloadInfo(item.content),
[item.content],
);
// If we have PDF download info, extract the remaining text
const messageContent = useMemo(() => {
if (!pdfDownloadInfo) return item.content;
// Remove the JSON part from the content to avoid showing it
return removePDFDownloadInfo(item.content, pdfDownloadInfo);
}, [item.content, pdfDownloadInfo]);
const handleRegenerateMessage = useCallback(() => {
regenerateMessage?.(item);
}, [regenerateMessage, item]);
@ -122,23 +141,34 @@ const MessageItem = ({
></UserGroupButton>
)}
<div
className={cn(
isAssistant
? theme === 'dark'
? styles.messageTextDark
: styles.messageText
: styles.messageUserText,
{ '!bg-bg-card': !isAssistant },
)}
>
<MarkdownContent
loading={loading}
content={item.content}
reference={reference}
clickDocumentButton={clickDocumentButton}
></MarkdownContent>
</div>
{/* Show PDF download button if download info is present */}
{pdfDownloadInfo && (
<PDFDownloadButton
downloadInfo={pdfDownloadInfo}
className="mb-2"
/>
)}
{/* Show message content if there's any text besides the download */}
{messageContent && (
<div
className={cn(
isAssistant
? theme === 'dark'
? styles.messageTextDark
: styles.messageText
: styles.messageUserText,
{ '!bg-bg-card': !isAssistant },
)}
>
<MarkdownContent
loading={loading}
content={messageContent}
reference={reference}
clickDocumentButton={clickDocumentButton}
></MarkdownContent>
</div>
)}
{isAssistant && referenceDocumentList.length > 0 && (
<ReferenceDocumentList
list={referenceDocumentList}

View File

@ -4,6 +4,7 @@ import {
IMessage,
IReferenceChunk,
IReferenceObject,
UploadResponseDataType,
} from '@/interfaces/database/chat';
import classNames from 'classnames';
import {
@ -24,6 +25,11 @@ import { WorkFlowTimeline } from '@/pages/agent/log-sheet/workflow-timeline';
import { isEmpty } from 'lodash';
import { Atom, ChevronDown, ChevronUp } from 'lucide-react';
import MarkdownContent from '../next-markdown-content';
import {
PDFDownloadButton,
extractPDFDownloadInfo,
removePDFDownloadInfo,
} from '../pdf-download-button';
import { RAGFlowAvatar } from '../ragflow-avatar';
import { useTheme } from '../theme-provider';
import { Button } from '../ui/button';
@ -95,6 +101,20 @@ function MessageItem({
return Object.values(docs);
}, [reference?.doc_aggs]);
// Extract PDF download info from message content
const pdfDownloadInfo = useMemo(
() => extractPDFDownloadInfo(item.content),
[item.content],
);
// If we have PDF download info, extract the remaining text
const messageContent = useMemo(() => {
if (!pdfDownloadInfo) return item.content;
// Remove the JSON part from the content to avoid showing it
return removePDFDownloadInfo(item.content, pdfDownloadInfo);
}, [item.content, pdfDownloadInfo]);
const handleRegenerateMessage = useCallback(() => {
regenerateMessage?.(item);
}, [regenerateMessage, item]);
@ -219,28 +239,39 @@ function MessageItem({
/>
</div>
)}
<div
className={cn({
[theme === 'dark'
? styles.messageTextDark
: styles.messageText]: isAssistant,
[styles.messageUserText]: !isAssistant,
'bg-bg-card': !isAssistant,
})}
>
{item.data ? (
children
) : sendLoading && isEmpty(item.content) ? (
<>{!isShare && 'running...'}</>
) : (
<MarkdownContent
loading={loading}
content={item.content}
reference={reference}
clickDocumentButton={clickDocumentButton}
></MarkdownContent>
)}
</div>
{/* Show PDF download button if download info is present */}
{pdfDownloadInfo && (
<PDFDownloadButton
downloadInfo={pdfDownloadInfo}
className="mb-2"
/>
)}
{/* Show message content if there's any text besides the download */}
{messageContent && (
<div
className={cn({
[theme === 'dark'
? styles.messageTextDark
: styles.messageText]: isAssistant,
[styles.messageUserText]: !isAssistant,
'bg-bg-card': !isAssistant,
})}
>
{item.data ? (
children
) : sendLoading && isEmpty(messageContent) ? (
<>{!isShare && 'running...'}</>
) : (
<MarkdownContent
loading={loading}
content={messageContent}
reference={reference}
clickDocumentButton={clickDocumentButton}
></MarkdownContent>
)}
</div>
)}
{isAssistant && referenceDocuments.length > 0 && (
<ReferenceDocumentList
list={referenceDocuments}
@ -248,7 +279,9 @@ function MessageItem({
)}
{isUser && (
<UploadedMessageFiles files={item.files}></UploadedMessageFiles>
<UploadedMessageFiles
files={item.files as File[] | UploadResponseDataType[]}
></UploadedMessageFiles>
)}
{/* {isAssistant && item.attachment && item.attachment.doc_id && (
<div className="w-full flex items-center justify-end">

View File

@ -0,0 +1,196 @@
import { Button } from '@/components/ui/button';
import { Download, FileText } from 'lucide-react';
import { useCallback } from 'react';
interface DocumentDownloadInfo {
filename: string;
base64: string;
mime_type: string;
}
interface DocumentDownloadButtonProps {
downloadInfo: DocumentDownloadInfo;
className?: string;
}
export function PDFDownloadButton({
downloadInfo,
className,
}: DocumentDownloadButtonProps) {
const handleDownload = useCallback(() => {
try {
// Convert base64 to blob
const byteCharacters = atob(downloadInfo.base64);
const byteNumbers = new Array(byteCharacters.length);
for (let i = 0; i < byteCharacters.length; i++) {
byteNumbers[i] = byteCharacters.charCodeAt(i);
}
const byteArray = new Uint8Array(byteNumbers);
const blob = new Blob([byteArray], { type: downloadInfo.mime_type });
// Create download link
const url = window.URL.createObjectURL(blob);
const link = document.createElement('a');
link.href = url;
link.download = downloadInfo.filename;
document.body.appendChild(link);
link.click();
// Cleanup
document.body.removeChild(link);
window.URL.revokeObjectURL(url);
} catch (error) {
console.error('Error downloading document:', error);
}
}, [downloadInfo]);
// Determine document type from mime_type or filename
const getDocumentType = () => {
if (downloadInfo.mime_type === 'application/pdf') return 'PDF Document';
if (
downloadInfo.mime_type ===
'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
)
return 'Word Document';
if (downloadInfo.mime_type === 'text/plain') return 'Text Document';
// Fallback to file extension
const ext = downloadInfo.filename.split('.').pop()?.toUpperCase();
if (ext === 'PDF') return 'PDF Document';
if (ext === 'DOCX') return 'Word Document';
if (ext === 'TXT') return 'Text Document';
return 'Document';
};
return (
<div
className={`flex items-center gap-3 p-4 border rounded-lg bg-background-card ${className || ''}`}
>
<div className="flex-shrink-0">
<div className="p-2 bg-accent-primary/10 rounded-lg">
<FileText className="w-6 h-6 text-accent-primary" />
</div>
</div>
<div className="flex-1 min-w-0">
<div className="font-medium text-sm truncate">
{downloadInfo.filename}
</div>
<div className="text-xs text-muted-foreground">{getDocumentType()}</div>
</div>
<Button
onClick={handleDownload}
size="sm"
className="flex items-center gap-2"
>
<Download className="w-4 h-4" />
Download
</Button>
</div>
);
}
// Helper function to detect if content contains document download info
export function extractPDFDownloadInfo(
content: string,
): DocumentDownloadInfo | null {
try {
// Try to parse as JSON first (for pure JSON content)
const parsed = JSON.parse(content);
if (parsed && parsed.filename && parsed.base64 && parsed.mime_type) {
// Accept PDF, DOCX, and TXT formats
const validMimeTypes = [
'application/pdf',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'text/plain',
];
if (validMimeTypes.includes(parsed.mime_type)) {
return parsed as DocumentDownloadInfo;
}
}
} catch {
// If direct parsing fails, try to extract JSON object from mixed content
// Look for a JSON object that contains the required fields
// This regex finds a balanced JSON object by counting braces
const startPattern = /\{[^{}]*"filename"[^{}]*:/g;
let match;
while ((match = startPattern.exec(content)) !== null) {
const startIndex = match.index;
let braceCount = 0;
let endIndex = startIndex;
// Find the matching closing brace
for (let i = startIndex; i < content.length; i++) {
if (content[i] === '{') braceCount++;
if (content[i] === '}') braceCount--;
if (braceCount === 0) {
endIndex = i + 1;
break;
}
}
if (endIndex > startIndex) {
try {
const jsonStr = content.substring(startIndex, endIndex);
const parsed = JSON.parse(jsonStr);
if (parsed && parsed.filename && parsed.base64 && parsed.mime_type) {
// Accept PDF, DOCX, and TXT formats
const validMimeTypes = [
'application/pdf',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'text/plain',
];
if (validMimeTypes.includes(parsed.mime_type)) {
return parsed as DocumentDownloadInfo;
}
}
} catch {
// This wasn't valid JSON, continue searching
}
}
}
}
return null;
}
// Helper function to remove document download info from content
export function removePDFDownloadInfo(
content: string,
downloadInfo: DocumentDownloadInfo,
): string {
try {
// First, check if the entire content is just the JSON (most common case)
try {
const parsed = JSON.parse(content);
if (
parsed &&
parsed.filename === downloadInfo.filename &&
parsed.base64 === downloadInfo.base64
) {
// The entire content is just the download JSON, return empty
return '';
}
} catch {
// Content is not pure JSON, continue with removal
}
// Try to remove the JSON string from content
const jsonStr = JSON.stringify(downloadInfo);
let cleaned = content.replace(jsonStr, '').trim();
// Also try with pretty-printed JSON (with indentation)
const prettyJsonStr = JSON.stringify(downloadInfo, null, 2);
cleaned = cleaned.replace(prettyJsonStr, '').trim();
// Also try to find and remove JSON object pattern from mixed content
// This handles cases where the JSON might have different formatting
const startPattern = /\{[^{}]*"filename"[^{}]*"base64"[^{}]*\}/g;
cleaned = cleaned.replace(startPattern, '').trim();
return cleaned;
} catch {
return content;
}
}