FileDigest Examples (Real Output Packet)

Download a reproducible public demo packet: the original source files plus the real FileDigest outputs, including per-source Markdown, HTML, Docling DocTags, JSON, and RAG chunks.

Start here to inspect the actual output FileDigest produces. This public demo packet was generated by uploading real public / permissively licensed files through the live FileDigest app. The files below are the actual stored artifacts from the job.

Public demo packet

The featured document is the NIST AI Risk Management Framework (NIST AI 100-1), a born-digital government report dense with tables and structured sections, so you can see Docling extract clean, faithful structure from real layout. The packet also includes an image-only scanned page to show automatic OCR recovering text from an unselectable scan.

Item	Value
Production job	`df56be0156354d259b5b63b4e08dabd4`
Final status	`SUCCEEDED`
Files parsed	7 of 7
Output tokens	24,017
RAG chunks	69
Warnings	None
Engine	Docling on GPU workers

Download the generated outputs

Download digest.md: the combined, source-organized Markdown context pack.
Download manifest.json: structured run metadata plus, for each source, the full set of representations (see below).
Download provenance.json: source URLs, hashes, and job provenance.

What is inside each manifest source

The upgraded engine returns more than plain text. For every source file, manifest.json includes a representations block with:

markdown: clean Markdown for that source.
html: rendered HTML.
doctags: Docling DocTags (structured layout tokens with positions).
docling_json: the full DoclingDocument JSON.
chunks: heading-contextualized chunks ready to embed for retrieval.

In the app these are shown in a side-by-side viewer: the original file on the left and any representation (Markdown, HTML, Chunks, DocTags, JSON) on the right, so you can confirm tables, headings, and figures landed correctly before using the output.

Download the original inputs

File	Source	License / status
nist-ai-risk-management-framework.pdf	NIST AI Risk Management Framework (NIST AI 100-1, January 2023)	Public domain (US Gov)
scanned-field-log.pdf	Image-only scan generated for this demo (auto-OCR showcase)	CC0-1.0
Earth_Lithograph.pdf	NASA Earth Lithograph	NASA educational media
ffc.docx	file-format-commons DOCX sample	CC0-1.0
ffc.pptx	file-format-commons PPTX sample	CC0-1.0
mdn-beginner-html-index.html	MDN beginner HTML sample	CC0-1.0
good-readme-template.md	Public README template	CC0-1.0

How this packet was produced

Seven public or permissively licensed files were collected and archived.
The files were uploaded through the app into private storage.
The job was processed by the production Docling engine (worker time 21.3 seconds on this run).
The generated digest.md and manifest.json were downloaded from the job detail page.
The production job was deleted after the public artifact copies were saved.

This is one public demo packet, not a universal benchmark. It does not prove how every scanned, damaged, encrypted, image-heavy, or unusually formatted file will parse. It does show the exact output contract the live app produces on a mixed public packet.

How to reproduce it

Download the input files above, open the FileDigest dashboard, then drop, paste, or choose them. Processing starts automatically (there is no separate upload-then-process step) and routes you to a live job view. Keep the default fast extraction mode. The outputs should follow the same contract, though token counts and worker time may vary with engine updates.

Use it from an agent

The same job runs behind the API. Submit with POST /v1/parse (Bearer key) and poll GET /v1/jobs/{id}; when complete the response carries the digest plus the per-source representations and chunks. See the OpenAPI spec and agent docs.