FileDigest Examples (Real Output Packet)
Download a reproducible public demo packet: the original source files plus the real FileDigest outputs, including per-source Markdown, HTML, Docling DocTags, JSON, and RAG chunks.
Start here to inspect the actual output FileDigest produces. This public demo packet was generated by uploading real public / permissively licensed files through the live FileDigest app. The files below are the actual stored artifacts from the job.
Public demo packet
The featured document is the NIST AI Risk Management Framework (NIST AI 100-1), a born-digital government report dense with tables and structured sections, so you can see Docling extract clean, faithful structure from real layout. The packet also includes an image-only scanned page to show automatic OCR recovering text from an unselectable scan.
| Item | Value |
|---|---|
| Production job | df56be0156354d259b5b63b4e08dabd4 |
| Final status | SUCCEEDED |
| Files parsed | 7 of 7 |
| Output tokens | 24,017 |
| RAG chunks | 69 |
| Warnings | None |
| Engine | Docling on Modal L4 |
Download the generated outputs
- Download digest.md: the combined, source-organized Markdown context pack.
- Download manifest.json: structured run metadata plus, for each source, the full set of representations (see below).
- Download provenance.json: source URLs, hashes, and job provenance.
What is inside each manifest source
The upgraded engine returns more than plain text. For every source file, manifest.json includes a representations block with:
markdown: clean Markdown for that source.html: rendered HTML.doctags: Docling DocTags (structured layout tokens with positions).docling_json: the full DoclingDocument JSON.chunks: heading-contextualized chunks ready to embed for retrieval.
In the app these are shown in a side-by-side viewer: the original file on the left and any representation (Markdown, HTML, Chunks, DocTags, JSON) on the right, so you can confirm tables, headings, and figures landed correctly before using the output.
Download the original inputs
| File | Source | License / status |
|---|---|---|
| nist-ai-risk-management-framework.pdf | NIST AI Risk Management Framework (NIST AI 100-1, January 2023) | Public domain (US Gov) |
| scanned-field-log.pdf | Image-only scan generated for this demo (auto-OCR showcase) | CC0-1.0 |
| Earth_Lithograph.pdf | NASA Earth Lithograph | NASA educational media |
| ffc.docx | file-format-commons DOCX sample | CC0-1.0 |
| ffc.pptx | file-format-commons PPTX sample | CC0-1.0 |
| mdn-beginner-html-index.html | MDN beginner HTML sample | CC0-1.0 |
| good-readme-template.md | Public README template | CC0-1.0 |
How this packet was produced
- Seven public or permissively licensed files were collected and archived.
- The files were uploaded through the app into private storage.
- The job was processed by the production Modal Docling engine (worker time 21.3 seconds on this run).
- The generated
digest.mdandmanifest.jsonwere downloaded from the job detail page. - The production job was deleted after the public artifact copies were saved.
This is one public demo packet, not a universal benchmark. It does not prove how every scanned, damaged, encrypted, image-heavy, or unusually formatted file will parse. It does show the exact output contract the live app produces on a mixed public packet.
How to reproduce it
Download the input files above, open the FileDigest dashboard, then drop, paste, or choose them. Processing starts automatically (there is no separate upload-then-process step) and routes you to a live job view. Keep the default fast extraction mode. The outputs should follow the same contract, though token counts and worker time may vary with engine updates.
Use it from an agent
The same job runs behind the API. Submit with POST /v1/parse (Bearer key) and poll GET /v1/jobs/{id}; when complete the response carries the digest plus the per-source representations and chunks. See the OpenAPI spec and agent docs.