feat: Add optional document metadata in OpenAI-compatible response references (#12950)

### What problem does this PR solve?

This PR adds an opt‑in way to include document‑level metadata in
OpenAI‑compatible reference chunks. Until now, metadata could be used
for filtering but wasn’t returned in responses. The change enables
clients to show richer citations (author/year/source, etc.) while
keeping payload size and privacy under control via an explicit request
flag and optional field allowlist.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Contribution during my time at RAGcon GmbH.
This commit is contained in:
Levi
2026-02-05 02:54:33 +01:00
committed by GitHub
parent 2843570d8e
commit 803b480f9c
3 changed files with 97 additions and 5 deletions

View File

@ -65,6 +65,10 @@ curl --request POST \
"stream": true,
"extra_body": {
"reference": true,
"reference_metadata": {
"include": true,
"fields": ["author", "year", "source"]
},
"metadata_condition": {
"logic": "and",
"conditions": [
@ -93,6 +97,9 @@ curl --request POST \
- `extra_body` (*Body parameter*) `object`
Extra request parameters:
- `reference`: `boolean` - include reference in the final chunk (stream) or in the final message (non-stream).
- `reference_metadata`: `object` - include document metadata in each reference chunk.
- `include`: `boolean` - enable document metadata in reference chunks.
- `fields`: `list[string]` - optional allowlist of metadata keys. Omit to include all. Use an empty list to include none.
- `metadata_condition`: `object` - metadata filter conditions applied to retrieval results.
#### Response
@ -275,6 +282,11 @@ data: {
"content": "```cd /usr/ports/editors/neovim/ && make install```## Android[Termux](https://github.com/termux/termux-app) offers a Neovim package.",
"document_id": "4bdd2ff65e1511f0907f09f583941b45",
"document_name": "INSTALL22.md",
"document_metadata": {
"author": "bob",
"year": "2023",
"source": "internal"
},
"dataset_id": "456ce60c5e1511f0907f09f583941b45",
"image_id": "",
"positions": [
@ -345,6 +357,11 @@ Non-stream:
"doc_type": "",
"document_id": "4bdd2ff65e1511f0907f09f583941b45",
"document_name": "INSTALL22.md",
"document_metadata": {
"author": "bob",
"year": "2023",
"source": "internal"
},
"id": "4b8935ac0a22deb1",
"image_id": "",
"positions": [
@ -3948,6 +3965,8 @@ data: {
data:[DONE]
```
When `extra_body.reference_metadata.include` is `true`, each reference chunk may include a `document_metadata` object.
Non-stream:
```json

View File

@ -83,7 +83,13 @@ completion = client.chat.completions.create(
{"role": "user", "content": "Can you tell me how to install neovim"},
],
stream=stream,
extra_body={"reference": reference}
extra_body={
"reference": reference,
"reference_metadata": {
"include": True,
"fields": ["author", "year", "source"],
},
}
)
if stream:
@ -98,6 +104,8 @@ else:
print(completion.choices[0].message.reference)
```
When `extra_body.reference_metadata.include` is `true`, each reference chunk may include a `document_metadata` object in both streaming and non-streaming responses.
## DATASET MANAGEMENT
---
@ -1518,6 +1526,8 @@ A list of `Chunk` objects representing references to the message, each containin
The ID of the referenced document.
- `document_name` `str`
The name of the referenced document.
- `document_metadata` `dict`
Optional document metadata, returned only when `extra_body.reference_metadata.include` is `true`.
- `position` `list[str]`
The location information of the chunk within the referenced document.
- `dataset_id` `str`
@ -1643,6 +1653,8 @@ A list of `Chunk` objects representing references to the message, each containin
The ID of the referenced document.
- `document_name` `str`
The name of the referenced document.
- `document_metadata` `dict`
Optional document metadata, returned only when `extra_body.reference_metadata.include` is `true`.
- `position` `list[str]`
The location information of the chunk within the referenced document.
- `dataset_id` `str`
@ -2596,4 +2608,3 @@ memory_object.get_message_content(message_id)
```
---