FileDigest Examples (Real Output Packet)

Download a reproducible public demo packet: the original source files plus the real FileDigest outputs, including per-source Markdown, HTML, Docling DocTags, JSON, and RAG chunks.


Start here to inspect the actual output FileDigest produces. This public demo packet was generated by uploading real public / permissively licensed files through the live FileDigest app. The files below are the actual stored artifacts from the job.

Public demo packet

The featured document is the NIST AI Risk Management Framework (NIST AI 100-1), a born-digital government report dense with tables and structured sections, so you can see Docling extract clean, faithful structure from real layout. The packet also includes an image-only scanned page to show automatic OCR recovering text from an unselectable scan.

ItemValue
Production jobdf56be0156354d259b5b63b4e08dabd4
Final statusSUCCEEDED
Files parsed7 of 7
Output tokens24,017
RAG chunks69
WarningsNone
EngineDocling on Modal L4

Download the generated outputs

What is inside each manifest source

The upgraded engine returns more than plain text. For every source file, manifest.json includes a representations block with:

  • markdown: clean Markdown for that source.
  • html: rendered HTML.
  • doctags: Docling DocTags (structured layout tokens with positions).
  • docling_json: the full DoclingDocument JSON.
  • chunks: heading-contextualized chunks ready to embed for retrieval.

In the app these are shown in a side-by-side viewer: the original file on the left and any representation (Markdown, HTML, Chunks, DocTags, JSON) on the right, so you can confirm tables, headings, and figures landed correctly before using the output.

Download the original inputs

FileSourceLicense / status
nist-ai-risk-management-framework.pdfNIST AI Risk Management Framework (NIST AI 100-1, January 2023)Public domain (US Gov)
scanned-field-log.pdfImage-only scan generated for this demo (auto-OCR showcase)CC0-1.0
Earth_Lithograph.pdfNASA Earth LithographNASA educational media
ffc.docxfile-format-commons DOCX sampleCC0-1.0
ffc.pptxfile-format-commons PPTX sampleCC0-1.0
mdn-beginner-html-index.htmlMDN beginner HTML sampleCC0-1.0
good-readme-template.mdPublic README templateCC0-1.0

How to reproduce it

Download the input files above, open the FileDigest dashboard, then drop, paste, or choose them. Processing starts automatically (there is no separate upload-then-process step) and routes you to a live job view. Keep the default fast extraction mode. The outputs should follow the same contract, though token counts and worker time may vary with engine updates.

Use it from an agent

The same job runs behind the API. Submit with POST /v1/parse (Bearer key) and poll GET /v1/jobs/{id}; when complete the response carries the digest plus the per-source representations and chunks. See the OpenAPI spec and agent docs.