LLM Context Pack Generator from Documents

Turn PDFs, Office files, and scans into an LLM-ready context pack: a Markdown digest, manifest, and RAG chunks built on Docling and warm GPUs.


FileDigest generates an LLM context pack from your documents by converting each source into clean, AI-ready text and structure. You drop, paste, or choose a file, processing starts automatically, and you get back a combined digest.md, a manifest.json, and per-source Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks.

What an LLM context pack contains

A context pack is the prepared layer between raw files and a language model. Instead of pasting a PDF, you get a predictable set of artifacts for every job:

  • A combined digest.md that merges all sources into one readable Markdown document.
  • A manifest.json that describes what was processed (file metadata and the artifacts produced).
  • Per-source outputs in several shapes: Markdown, HTML, Docling DocTags, and Docling JSON.
  • Heading-contextualized RAG chunks, so each chunk carries the section heading it came from instead of floating free.

Every output is viewable side-by-side with the original PDF, so you can confirm a table, figure, or heading was extracted faithfully before you ship the text to a model.

How the generation pipeline works

The flow is built to be one step. You drop, paste, or choose a file and processing begins immediately (there is no separate "process" button), then the page routes you straight to a live job view where you watch the run progress.

Conversion runs on Docling on warm Modal L4 GPUs. The converter and models load once per warm container, so the first job pays the startup cost and repeat jobs reuse that warm state and finish faster. Supported inputs are broad: PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles, so a mixed folder of files becomes a single coherent pack.

Scanned PDFs are detected automatically and OCR is applied without you toggling anything. Optional enrichments go further when you need them: formulas converted to LaTeX, code handling, picture descriptions, and a high-accuracy VLM tier for difficult documents.

Build packs by hand or call the API

For one-off work, the upload-and-go interface is the fastest path. For pipelines and agents, FileDigest exposes an agentic REST API so the same context packs can be generated programmatically.

You submit a job with POST /v1/parse and poll results with GET /v1/jobs/{id}. Requests use Bearer key authentication, idempotency keys keep retries safe, and errors come back as RFC 9457 problem+json so failures are machine-readable. The full schema is published as OpenAPI 3.1 at /openapi.json, and agent-oriented documentation lives at /llms.txt so a coding agent can discover how to drive the service on its own.

Keeping document packs private

Context packs often hold sensitive material, so storage is private and per-user. Every download passes an authenticated ownership check, and files are served through private signed download links rather than public URLs. FileDigest offers Free, Pro, and Business plans, with OCR, larger jobs, and higher token quotas available on the paid tiers.

FAQ

Generating an LLM context pack from a PDF

Drop, paste, or choose the PDF in FileDigest and processing starts automatically, then you are routed to a live job view. When the job finishes you get a combined digest.md, a manifest.json, per-source Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks, all viewable next to the original PDF.

Supported file types

PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles. A ZIP lets you submit a whole folder of mixed files and receive one combined pack across all of them.

Scanned documents and OCR

Yes. Scanned PDFs are detected automatically and OCR is applied without any manual setting. You can also enable optional enrichments such as formulas to LaTeX, code handling, picture descriptions, and a high-accuracy VLM tier for harder documents.

Programmatic context pack generation

Yes. Use the REST API: POST /v1/parse to start a job and GET /v1/jobs/{id} to fetch results, with Bearer key auth, idempotency keys, and RFC 9457 problem+json errors. The OpenAPI 3.1 spec is at /openapi.json and agent docs are at /llms.txt.