# FileDigest Full Public Context
FileDigest converts PDFs, DOCX, PPTX, TXT, Markdown, HTML, and ZIP bundles into AI-ready Markdown digests and manifest.json files. This file is a crawlable public context bundle generated from product pages, help docs, and articles.
# Product Pages
## AI Document Processing Self-Test
URL: https://filedigest.dev/ai-document-processing-benchmark
Description: A practical self-test checklist for PDF, DOCX, ZIP, OCR, Markdown, and manifest quality before scaling FileDigest usage.
Before spending money on traffic or paid document workflows, FileDigest should be judged by completed jobs, readable output, and repeatable artifacts. This checklist gives buyers and operators a simple way to test whether document preparation is ready for production AI work.
Start with the [public demo packet](/examples) if you want a reproducible baseline, then run your own files. A meaningful self-test should use documents whose source quality you understand, because damaged scans, unusual tables, and encrypted files can change the result.
[Download this checklist](/samples/filedigest-benchmark-checklist.md)
## Test packet
Run one small packet with:
- one text-based PDF
- one OCR-heavy PDF if your plan supports OCR
- one DOCX file
- one PPTX file
- one HTML export
- one plain-text or Markdown note
- one ZIP containing mixed supported files
## What to inspect
| Check | Good sign | Bad sign |
| --- | --- | --- |
| File count | Every accepted file appears in the manifest | A file disappears without a warning |
| Page count | PDFs show plausible page counts; DOCX/page counts are labeled when unknown | Counts are missing without explanation or clearly impossible |
| Source boundaries | Digest sections keep file/source IDs visible | The digest becomes one blended summary |
| Tables | Important tables remain readable or are flagged | Tables silently collapse into unusable text |
| Warnings | Problems are explicit in the manifest | The run pretends imperfect files were perfect |
| Downloads | Digest and manifest download through authenticated routes | Artifacts are public or inaccessible to the owner |
| Downstream reuse | The digest supports a defined ChatGPT, Claude, RAG, or analysis prompt | The output requires heavy manual cleanup before use |
## Suggested scoring
Use a simple 0 to 2 score for each item:
- `0`: failed or unusable
- `1`: usable with human cleanup
- `2`: usable as AI-ready context
This score is a practical self-test heuristic, not a certified benchmark. A first packet is promising when most checks score `2` and there are no privacy, access-control, or silent-data-loss failures.
## Quality standards
The output should make source boundaries visible, preserve enough structure to review, and show failures in the manifest instead of pretending every file was perfect.
## What FileDigest is optimizing for
FileDigest is not a black-box summarizer. The goal is AI-ready document preparation: inspectable Markdown, structured manifests, private downloads, and repeatable context packs.
## Consulting Document Packets
URL: https://filedigest.dev/consulting-document-packets
Description: Turn client packets, reports, decks, notes, and policy documents into AI-ready Markdown context packs.
Consulting and analyst work often involves many source files: client reports, policy PDFs, market notes, exported pages, and internal drafts. FileDigest prepares those files for AI-assisted synthesis without asking the user to hand-clean each document first.
## Best fit
FileDigest is useful when the same team repeatedly prepares document packets for memo drafting, proposal work, market scans, due diligence, or internal analysis.
## What the user gets
- a readable `digest.md`
- a structured `manifest.json`
- job history
- private downloads
- plan limits before expensive processing starts
## Boundary
FileDigest prepares documents. It does not replace human review, legal review, client judgment, or source verification.
## Hosted Docling UI & Web Interface for Doc Conversion
URL: https://filedigest.dev/docling-ui
Description: FileDigest is a hosted Docling UI: upload a PDF, DOCX, or image and get Markdown, JSON, DocTags, and RAG chunks from Docling on warm Modal L4 GPUs.
FileDigest is a hosted web interface for [Docling](https://github.com/docling-project/docling) document conversion. You drop, paste, or choose a file in the browser, and FileDigest runs Docling on warm Modal L4 GPUs to produce AI-ready Markdown, JSON, DocTags, and retrieval chunks, with no local install, GPU setup, or model downloads required.
## What a hosted Docling UI does for you
Docling is an excellent open-source document converter, but running it yourself means installing Python dependencies, downloading models, provisioning a GPU, and wiring up upload handling, storage, and job tracking. FileDigest packages all of that as a hosted product so you can use Docling from any browser.
Upload is one step. You drop a file onto the page, paste it, or choose it from disk, and processing starts automatically. There is no separate "process" button to hunt for. The app routes you straight to a live job view where you watch the conversion progress and then inspect the results.
The conversion engine is Docling running on warm Modal L4 GPUs. The converter and its models load once per warm container and stay resident, so repeat jobs do not pay the cold-start model-loading cost every time. That keeps interactive, back-to-back conversions fast.
## File types you can convert
FileDigest accepts the document and data formats Docling handles, including:
- PDF (digital and scanned)
- DOCX, PPTX, and XLSX
- Images
- TXT, Markdown, HTML, and CSV
- ZIP bundles, so you can submit many sources in a single job
Scanned PDFs are detected automatically and OCR is applied without extra configuration. Optional enrichments let you convert formulas to LaTeX, extract code, and generate picture descriptions, and a high-accuracy VLM tier is available when you need the cleanest possible structure recovery.
## Outputs you can actually use
Every job produces a combined `digest.md` and a `manifest.json`, plus a full set of per-source artifacts. For each source file you get:
- Markdown
- HTML
- Docling DocTags
- Docling JSON
- Heading-contextualized RAG chunks, ready for embedding and retrieval
You can view every output side-by-side with the original PDF, so you can confirm that tables, headings, and figures landed where they should before you ship the result into a pipeline. The combined `digest.md` gives people and LLMs one readable artifact, while `manifest.json` gives automation a structured record of what was produced.
## An API for agents, not just a UI
The same conversion runs behind an agentic REST API, so what you test in the web interface is exactly what your code calls in production. Submit a job with `POST /v1/parse` and poll it with `GET /v1/jobs/{id}`. Authentication uses a Bearer key. The full contract is published as OpenAPI 3.1 at `/openapi.json`, errors follow the RFC 9457 problem+json format, and idempotency keys let agents retry safely without creating duplicate jobs. Agent-oriented documentation lives at `/llms.txt` so coding assistants can discover the API on their own.
## Privacy and plans
Storage is private and per-user. Ownership checks gate every request, and downloads are served through private signed links rather than public URLs. FileDigest offers Free, Pro, and Business plans. Paid tiers add OCR, support larger jobs, and raise token quotas, so you can move from trying a single document to running it at scale.
## FAQ
### Is FileDigest a hosted version of Docling?
Yes. FileDigest runs the open-source Docling engine on warm Modal L4 GPUs and wraps it in a browser UI plus a REST API, so you get Docling conversions without installing the library, downloading models, or managing a GPU.
### What output formats does the Docling UI produce?
Each job returns a combined `digest.md` and `manifest.json`. For every source file you also get Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks, all viewable next to the original PDF.
### Can it convert scanned PDFs and images?
Yes. Scanned PDFs are detected automatically and OCR is applied. Images are supported as direct inputs, and optional enrichments can convert formulas to LaTeX, extract code, and describe pictures.
### Can I call FileDigest from code or an AI agent?
Yes. Use `POST /v1/parse` to submit a job and `GET /v1/jobs/{id}` to fetch results, with Bearer-key auth. The API ships an OpenAPI 3.1 spec at `/openapi.json`, RFC 9457 problem+json errors, idempotency keys, and agent docs at `/llms.txt`.
## DOCX to Markdown for ChatGPT | FileDigest
URL: https://filedigest.dev/docx-to-markdown-for-chatgpt
Description: Convert DOCX files into clean, AI-ready Markdown for ChatGPT. Upload to FileDigest, get a digest.md plus RAG chunks, manifest, and side-by-side review.
To convert a DOCX file into Markdown for ChatGPT, upload it to FileDigest and processing starts automatically. You get back a clean `digest.md` you can paste straight into ChatGPT, plus heading-aware chunks and a manifest, all stored privately under your account.
DOCX files carry reports, proposals, research notes, and appendices that are awkward to feed to an AI tool. ChatGPT works best with plain, structured text, so converting Word documents to Markdown first removes formatting noise and preserves the heading structure the model relies on for context.
## How to convert a DOCX to Markdown with FileDigest
FileDigest uses a single-step upload. Drop, paste, or choose your `.docx` file and the job starts immediately, with no separate "process" button to hunt for. You are routed straight to a live job view where you can watch the conversion run and then inspect the results.
Conversion runs on Docling, an open document-understanding engine, hosted on warm Modal L4 GPUs. The converter and its models load once per warm container, so after the first job your repeat conversions are noticeably faster. The result is Markdown that keeps headings, lists, and tables intact rather than a flattened text dump.
## What you get back for ChatGPT
Every source produces a combined `digest.md` that is ready to paste into ChatGPT or save into a prompt packet. Alongside it you get a `manifest.json` recording file metadata, job status, artifacts, and token estimates, so you know whether the content fits your model's context window before you paste.
Each source also yields several views: per-source Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks. Those chunks carry their surrounding heading context, which is what you want when you are building retrieval pipelines rather than pasting a whole document into one chat. You can review everything side by side with the original to confirm nothing important was lost.
## Beyond DOCX: mixed document bundles
DOCX is rarely the only format in a real project. FileDigest also accepts PDF, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles, so you can convert a folder of mixed documents into one consistent set of Markdown artifacts for ChatGPT.
Scanned PDFs in a bundle are detected automatically and OCR is applied, so text trapped in images becomes usable Markdown. Optional enrichments can turn formulas into LaTeX, capture code blocks, and add picture descriptions, and a high-accuracy VLM tier is available when you need maximum fidelity.
## Automating DOCX conversion via API
If you want ChatGPT-ready Markdown inside an agent or a pipeline, FileDigest exposes an agentic REST API. Submit a job with `POST /v1/parse` and poll `GET /v1/jobs/{id}`, authenticating with a Bearer key. The API publishes an OpenAPI 3.1 spec at `/openapi.json`, supports idempotency keys for safe retries, and returns structured RFC 9457 problem+json errors. Agent-focused documentation lives at `/llms.txt`.
Your files stay in private per-user storage with authenticated ownership checks and private signed downloads, so converted documents are not shared or left publicly reachable.
## FAQ
### Using converted Markdown in ChatGPT
Yes. The `digest.md` output is plain Markdown designed to paste into a chat. The accompanying `manifest.json` includes token estimates, which helps you check the content fits your context window before pasting.
### Scanned and image-based DOCX content
FileDigest auto-detects scanned PDFs and applies OCR. For Word documents with embedded images, optional picture descriptions and the high-accuracy VLM tier help capture content that is not plain text.
### Supported file types beyond DOCX
Yes. Inputs include PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles, so you can process mixed-format document sets in one job.
### Free plan availability
FileDigest offers Free, Pro, and Business plans. Paid plans add OCR, larger jobs, and higher token quotas, which matter for big DOCX batches or heavy retrieval pipelines.
## FileDigest Examples (Real Output Packet)
URL: https://filedigest.dev/examples
Description: Download a reproducible public demo packet: the original source files plus the real FileDigest outputs, including per-source Markdown, HTML, Docling DocTags, JSON, and RAG chunks.
Start here to inspect the actual output FileDigest produces. This public demo packet was generated by uploading real public / permissively licensed files through the live FileDigest app. The files below are the actual stored artifacts from the job.
## Public demo packet
The featured document is the NIST AI Risk Management Framework (NIST AI 100-1), a born-digital government report dense with tables and structured sections, so you can see Docling extract clean, faithful structure from real layout. The packet also includes an image-only scanned page to show automatic OCR recovering text from an unselectable scan.
| Item | Value |
| --- | --- |
| Production job | `df56be0156354d259b5b63b4e08dabd4` |
| Final status | `SUCCEEDED` |
| Files parsed | 7 of 7 |
| Output tokens | 24,017 |
| RAG chunks | 69 |
| Warnings | None |
| Engine | Docling on Modal L4 |
## Download the generated outputs
- [Download digest.md](/proof/filedigest-public-demo-2026-04-30/outputs/digest.md): the combined, source-organized Markdown context pack.
- [Download manifest.json](/proof/filedigest-public-demo-2026-04-30/outputs/manifest.json): structured run metadata plus, for each source, the full set of representations (see below).
- [Download provenance.json](/proof/filedigest-public-demo-2026-04-30/provenance.json): source URLs, hashes, and job provenance.
## What is inside each manifest source
The upgraded engine returns more than plain text. For every source file, `manifest.json` includes a `representations` block with:
- `markdown`: clean Markdown for that source.
- `html`: rendered HTML.
- `doctags`: Docling DocTags (structured layout tokens with positions).
- `docling_json`: the full DoclingDocument JSON.
- `chunks`: heading-contextualized chunks ready to embed for retrieval.
In the app these are shown in a side-by-side viewer: the original file on the left and any representation (Markdown, HTML, Chunks, DocTags, JSON) on the right, so you can confirm tables, headings, and figures landed correctly before using the output.
## Download the original inputs
| File | Source | License / status |
| --- | --- | --- |
| [nist-ai-risk-management-framework.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/nist-ai-risk-management-framework.pdf) | NIST AI Risk Management Framework (NIST AI 100-1, January 2023) | Public domain (US Gov) |
| [scanned-field-log.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/scanned-field-log.pdf) | Image-only scan generated for this demo (auto-OCR showcase) | CC0-1.0 |
| [Earth_Lithograph.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/Earth_Lithograph.pdf) | NASA Earth Lithograph | NASA educational media |
| [ffc.docx](/proof/filedigest-public-demo-2026-04-30/inputs/ffc.docx) | file-format-commons DOCX sample | CC0-1.0 |
| [ffc.pptx](/proof/filedigest-public-demo-2026-04-30/inputs/ffc.pptx) | file-format-commons PPTX sample | CC0-1.0 |
| [mdn-beginner-html-index.html](/proof/filedigest-public-demo-2026-04-30/inputs/mdn-beginner-html-index.html) | MDN beginner HTML sample | CC0-1.0 |
| [good-readme-template.md](/proof/filedigest-public-demo-2026-04-30/inputs/good-readme-template.md) | Public README template | CC0-1.0 |
## How this packet was produced
1. Seven public or permissively licensed files were collected and archived.
2. The files were uploaded through the app into private storage.
3. The job was processed by the production Modal Docling engine (worker time 21.3 seconds on this run).
4. The generated `digest.md` and `manifest.json` were downloaded from the job detail page.
5. The production job was deleted after the public artifact copies were saved.
This is one public demo packet, not a universal benchmark. It does not prove how every scanned, damaged, encrypted, image-heavy, or unusually formatted file will parse. It does show the exact output contract the live app produces on a mixed public packet.
## How to reproduce it
Download the input files above, open the FileDigest dashboard, then drop, paste, or choose them. Processing starts automatically (there is no separate upload-then-process step) and routes you to a live job view. Keep the default fast extraction mode. The outputs should follow the same contract, though token counts and worker time may vary with engine updates.
## Use it from an agent
The same job runs behind the API. Submit with `POST /v1/parse` (Bearer key) and poll `GET /v1/jobs/{id}`; when complete the response carries the digest plus the per-source representations and chunks. See the [OpenAPI spec](/openapi.json) and [agent docs](/llms.txt).
## FileDigest vs ChatGPT File Upload
URL: https://filedigest.dev/filedigest-vs-chatgpt-file-upload
Description: When to use ChatGPT file upload directly and when to prepare documents first with FileDigest.
ChatGPT file upload is useful for one-off reading. FileDigest is useful when document preparation needs to be repeatable, inspectable, downloadable, and reusable outside one chat session.
## Use ChatGPT upload when
- you have one or two simple files
- you only need one conversation
- you do not need reusable Markdown artifacts
- you do not need a manifest of source files and processing outcomes
## Use FileDigest when
- you have PDFs, DOCX, PPTX, notes, HTML, or ZIP bundles
- you want `digest.md` and `manifest.json`
- you need job history and private downloads
- you want to inspect context before using it with an LLM
- you are preparing material for RAG, Claude, AI coding tools, or another downstream workflow
## Product stance
FileDigest does not compete with ChatGPT as a model. It prepares better context for AI tools.
## FileDigest vs Claude Project Knowledge
URL: https://filedigest.dev/filedigest-vs-claude-project-knowledge
Description: Compare Claude project knowledge with FileDigest document preparation for reusable Markdown context packs.
Claude project knowledge is useful for keeping context inside a Claude project. FileDigest is useful when you want a portable artifact that can be inspected, downloaded, copied, and reused across tools.
## Use Claude project knowledge when
- your workflow stays inside Claude
- the files are stable
- you do not need separate Markdown and manifest artifacts
## Use FileDigest when
- you want a `digest.md` file
- you want a `manifest.json` file
- you need to process ZIP bundles or mixed document packets
- you want to reuse the same prepared context in ChatGPT, Claude, AI coding tools, notebooks, or RAG pipelines
## Product stance
FileDigest is the prep layer before the AI workspace. Claude can be one of the destinations.
## FileDigest vs Docling CLI
URL: https://filedigest.dev/filedigest-vs-docling-cli
Description: Compare a hosted FileDigest workflow with running Docling directly from the command line.
Docling is powerful infrastructure for document conversion. FileDigest wraps a Modal Docling engine in a paid SaaS workflow with auth, uploads, billing, private storage, job history, plan limits, and downloadable artifacts.
## Use Docling CLI when
- you are comfortable scripting locally
- you only need your own machine
- you do not need subscriptions, user accounts, or hosted job history
## Use FileDigest when
- you want a browser workflow
- you want users to upload files directly to private storage
- you want Modal processing instead of local processing
- you need plan gates before expensive compute starts
- you want a reusable `digest.md` and `manifest.json`
## Product stance
FileDigest is not trying to replace Docling. It productizes a specific AI-document-prep workflow around it.
## LLM Context Pack Generator from Documents
URL: https://filedigest.dev/llm-context-pack-generator
Description: Turn PDFs, Office files, and scans into an LLM-ready context pack: a Markdown digest, manifest, and RAG chunks built on Docling and warm GPUs.
FileDigest generates an LLM context pack from your documents by converting each source into clean, AI-ready text and structure. You drop, paste, or choose a file, processing starts automatically, and you get back a combined `digest.md`, a `manifest.json`, and per-source Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks.
## What an LLM context pack contains
A context pack is the prepared layer between raw files and a language model. Instead of pasting a PDF, you get a predictable set of artifacts for every job:
- A combined `digest.md` that merges all sources into one readable Markdown document.
- A `manifest.json` that describes what was processed (file metadata and the artifacts produced).
- Per-source outputs in several shapes: Markdown, HTML, Docling DocTags, and Docling JSON.
- Heading-contextualized RAG chunks, so each chunk carries the section heading it came from instead of floating free.
Every output is viewable side-by-side with the original PDF, so you can confirm a table, figure, or heading was extracted faithfully before you ship the text to a model.
## How the generation pipeline works
The flow is built to be one step. You drop, paste, or choose a file and processing begins immediately (there is no separate "process" button), then the page routes you straight to a live job view where you watch the run progress.
Conversion runs on Docling on warm Modal L4 GPUs. The converter and models load once per warm container, so the first job pays the startup cost and repeat jobs reuse that warm state and finish faster. Supported inputs are broad: PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles, so a mixed folder of files becomes a single coherent pack.
Scanned PDFs are detected automatically and OCR is applied without you toggling anything. Optional enrichments go further when you need them: formulas converted to LaTeX, code handling, picture descriptions, and a high-accuracy VLM tier for difficult documents.
## Build packs by hand or call the API
For one-off work, the upload-and-go interface is the fastest path. For pipelines and agents, FileDigest exposes an agentic REST API so the same context packs can be generated programmatically.
You submit a job with `POST /v1/parse` and poll results with `GET /v1/jobs/{id}`. Requests use Bearer key authentication, idempotency keys keep retries safe, and errors come back as RFC 9457 problem+json so failures are machine-readable. The full schema is published as OpenAPI 3.1 at `/openapi.json`, and agent-oriented documentation lives at `/llms.txt` so a coding agent can discover how to drive the service on its own.
## Keeping document packs private
Context packs often hold sensitive material, so storage is private and per-user. Every download passes an authenticated ownership check, and files are served through private signed download links rather than public URLs. FileDigest offers Free, Pro, and Business plans, with OCR, larger jobs, and higher token quotas available on the paid tiers.
## FAQ
### Generating an LLM context pack from a PDF
Drop, paste, or choose the PDF in FileDigest and processing starts automatically, then you are routed to a live job view. When the job finishes you get a combined `digest.md`, a `manifest.json`, per-source Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks, all viewable next to the original PDF.
### Supported file types
PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles. A ZIP lets you submit a whole folder of mixed files and receive one combined pack across all of them.
### Scanned documents and OCR
Yes. Scanned PDFs are detected automatically and OCR is applied without any manual setting. You can also enable optional enrichments such as formulas to LaTeX, code handling, picture descriptions, and a high-accuracy VLM tier for harder documents.
### Programmatic context pack generation
Yes. Use the REST API: `POST /v1/parse` to start a job and `GET /v1/jobs/{id}` to fetch results, with Bearer key auth, idempotency keys, and RFC 9457 problem+json errors. The OpenAPI 3.1 spec is at `/openapi.json` and agent docs are at `/llms.txt`.
## manifest.json for Processed Documents | FileDigest
URL: https://filedigest.dev/manifest-json-document-processing
Description: A manifest.json describing processed documents is a structured index of every source, output artifact, and job outcome. See what FileDigest writes and why.
A `manifest.json` describing processed documents is a structured index that records what was converted, which artifacts were produced for each source, and how the job turned out. In FileDigest, every conversion job writes a `manifest.json` alongside a human-readable `digest.md`, so software and agents can read the result without parsing prose.
## What FileDigest writes into manifest.json
When you upload a file (drop, paste, or choose, with processing starting automatically and no separate process button), FileDigest converts it and emits a machine-readable record of the run. The `manifest.json` captures the job-level facts that downstream code needs:
- Job status and per-source processing outcomes.
- Source files with their file sizes and MIME types.
- Page counts where available.
- The output artifacts generated for each source.
- Token estimates for the produced text.
- Warnings and failures, so partial results are explicit rather than silent.
Where `digest.md` is the artifact a person reads or pastes into a model, `manifest.json` is the audit and automation layer that a pipeline reads to know exactly what it received.
## The artifacts a manifest points to
FileDigest does not produce a single flattened file. For each source it generates a set of artifacts, and the manifest is the index that ties them together: a combined `digest.md`, the `manifest.json` itself, plus per-source Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks. In the app you can view these side by side with the original PDF, so you can confirm the conversion matches the source before anything is indexed or sent to a model.
Behind the scenes the conversion runs on Docling using warm Modal L4 GPUs. The converter and models load once per warm container, so repeat jobs in a session are fast. Scanned PDFs are detected automatically and OCR is applied, and optional enrichments can turn formulas into LaTeX, label code, and add picture descriptions, with a high-accuracy VLM tier available for harder documents.
## Why a manifest matters for RAG and agents
A folder of one-off conversions is hard to test and debug. A manifest turns a batch into something a RAG pipeline, evaluator, or agent workflow can reason about programmatically: it can see which files succeeded, which failed, what artifacts exist, and roughly how many tokens each one represents before it spends a single embedding call. That makes ingestion repeatable and reviewable instead of a guessing game.
Because the manifest enumerates outcomes per source, it also makes human review targeted. You route only the sources flagged with warnings or failures to a person, and let the clean ones flow straight into chunking and indexing.
## Accepted inputs and how to retrieve the manifest
FileDigest accepts PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles. Drop any of them in and the job starts on its own, then routes you to a live job view.
For automated workflows there is an agentic REST API: `POST /v1/parse` to submit work and `GET /v1/jobs/{id}` to retrieve the result, including the manifest. The API uses Bearer key authentication, publishes an OpenAPI 3.1 spec at `/openapi.json`, supports idempotency keys, and returns RFC 9457 problem+json errors. Agent-oriented documentation lives at `/llms.txt`. Storage is private per user, with authenticated ownership checks and private signed downloads, so a manifest and its artifacts are only reachable by their owner.
## FAQ
### What is a manifest.json for processed documents?
It is a structured JSON file that describes the outcome of a document-processing job: the source files, their sizes and MIME types, page counts, the output artifacts produced, token estimates, and any warnings or failures. In FileDigest it ships with every job next to the readable `digest.md`.
### How is manifest.json different from digest.md?
`digest.md` is the human-readable, paste-into-a-model artifact. `manifest.json` is the automation layer: it is the structured record that lets pipelines, evaluators, and agents understand what was processed and what still needs review.
### Can I get the manifest through an API?
Yes. Submit a job with `POST /v1/parse` and fetch the result, including the manifest, with `GET /v1/jobs/{id}`. The API uses Bearer key auth and publishes an OpenAPI 3.1 spec at `/openapi.json`.
### Which file types produce a manifest?
Any supported input does: PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles. Every job, regardless of input type, writes a `manifest.json` indexing the artifacts generated for each source.
## Run a Docling Workflow on Modal GPUs | FileDigest
URL: https://filedigest.dev/modal-docling-workflow
Description: Run a Docling document workflow on warm Modal L4 GPUs without owning infrastructure. Upload a file and get AI-ready Markdown, RAG chunks, and a REST API.
FileDigest runs a managed Docling workflow on warm Modal L4 GPUs, so you upload a document and get back AI-ready outputs (Markdown, RAG chunks, Docling JSON, and more) without provisioning, scaling, or maintaining any GPU infrastructure yourself.
## What "running Docling on Modal GPUs" means here
Docling is the open document conversion engine that turns files like PDFs and slide decks into structured, machine-readable representations. Modal is a serverless GPU platform. FileDigest combines the two: Docling runs inside warm Modal containers backed by L4 GPUs, and the converter plus its models load once per warm container. The practical result is that repeat jobs land on an already-initialized container and run fast, because you are not paying the model-load cost on every request.
You do not write any deployment code, manage container lifecycles, or pick GPU types. FileDigest wraps the whole pipeline (authentication, plan limits, private storage, billing, job history, and downloads) around the Docling-on-Modal engine.
## How the workflow runs, step by step
The workflow is one step to start. Drop a file, paste it, or choose it from your device, and processing begins automatically. There is no separate "process" button to click. You are then routed straight to a live job view where you can watch the conversion progress.
Supported inputs cover the formats most teams actually have: PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles for batches of files.
Scanned PDFs are detected automatically and OCR is applied without you toggling anything. You can also turn on optional enrichments (converting formulas to LaTeX, extracting code, and generating picture descriptions), plus a high-accuracy VLM tier when you need maximum fidelity on complex layouts.
## What you get back from each job
For every source file, FileDigest produces a consistent set of outputs:
- A combined `digest.md` across the job and a `manifest.json` describing what was produced.
- Per-source Markdown and HTML.
- Docling DocTags and Docling JSON for structured downstream use.
- Heading-contextualized RAG chunks, ready to embed and feed into a retrieval pipeline.
You can view these outputs side by side with the original PDF, so it is easy to confirm that tables, headings, and figures were captured correctly before you trust the extraction.
## Driving the same workflow from code or an agent
The hosted Docling-on-Modal workflow is also a programmable API, which matters if you want an agent or backend service to run conversions on your behalf. You send a `POST` to `/v1/parse` to submit a job and poll `GET /v1/jobs/{id}` to retrieve status and results. Authentication uses a Bearer API key.
The API is built for automation: there is an OpenAPI 3.1 spec at `/openapi.json`, support for idempotency keys so retries do not create duplicate jobs, and errors returned as RFC 9457 problem+json so failures are machine-readable. Agent-focused documentation lives at `/llms.txt`.
Your documents stay private throughout. Storage is per-user, ownership is verified on authenticated checks, and downloads are served through private signed links rather than public URLs.
## FAQ
### Do I need a Modal account or GPU setup to run Docling?
No. FileDigest operates the Modal GPU infrastructure for you. You sign in, upload a file, and the Docling workflow runs on warm Modal L4 GPUs behind the scenes. There is nothing to deploy, scale, or maintain on your side.
### Why does FileDigest run Docling on warm GPUs?
Document conversion models are expensive to load. By keeping containers warm, FileDigest loads the Docling converter and its models once per container, so subsequent jobs that hit a warm container skip that startup cost and finish faster.
### Can it handle scanned documents and complex layouts?
Yes. Scanned PDFs are detected automatically and OCR is applied. For documents with formulas, code, or images, you can enable optional enrichments (formulas to LaTeX, code extraction, picture descriptions), and a high-accuracy VLM tier is available for the hardest layouts. Note that OCR and larger jobs are part of the paid Pro and Business plans.
### How do I run a Docling workflow programmatically?
Send a `POST /v1/parse` request with a Bearer API key to start a job, then poll `GET /v1/jobs/{id}` for results. The OpenAPI 3.1 spec is at `/openapi.json`, errors follow RFC 9457 problem+json, idempotency keys prevent duplicate jobs, and agent docs are at `/llms.txt`.
## OCR a Scanned PDF to Markdown for AI | FileDigest
URL: https://filedigest.dev/ocr-pdf-to-markdown
Description: OCR scanned PDFs into clean, AI-ready Markdown. FileDigest auto-detects scans, runs Docling OCR on warm GPUs, and outputs digest.md plus RAG chunks.
To OCR a scanned PDF to Markdown for AI, drop the file into FileDigest. It automatically detects that the pages are scanned, applies OCR through the Docling engine, and returns clean Markdown plus heading-aware RAG chunks you can paste into ChatGPT or Claude, or feed into a retrieval pipeline. There is no separate "process" button and no OCR toggle to hunt for: scanned pages are recognized and handled for you.
## How OCR to Markdown works in FileDigest
Upload a scanned PDF by dropping, pasting, or choosing it, and processing starts immediately. The job routes to a live view where you can watch it run. FileDigest inspects the file, detects pages that have no reliable embedded text (photographed pages, image-only scans, older manuals), and applies OCR during conversion.
Conversion runs on the Docling engine on warm Modal L4 GPUs. The converter and its models load once per warm container, so the first job pays the warm-up cost and repeat jobs are fast. The result is a structured digest rather than a flat text dump, with headings, tables, and layout reconstructed into Markdown.
## What you get back
Every source produces a full set of artifacts you can view side by side with the original PDF:
- A combined `digest.md` for pasting into an LLM context window or saving as a prompt packet.
- A `manifest.json` recording file metadata, processing outcomes, pages, artifacts, and token estimates.
- Per-source Markdown, HTML, Docling DocTags, and Docling JSON.
- Heading-contextualized RAG chunks, where each chunk carries the heading path it came from, so retrieval stays accurate instead of returning orphaned fragments.
Because the OCR text lands in real Markdown structure, your model sees headings and tables in context, not a wall of recognized characters.
## Beyond plain OCR: enrichments and a VLM tier
Scanned documents often carry more than body text. FileDigest offers optional enrichments on top of OCR: formulas converted to LaTeX, code blocks preserved, and picture descriptions generated for images. For difficult scans, dense tables, or pages where standard OCR struggles, a high-accuracy VLM (vision-language model) tier is available for higher-fidelity extraction.
OCR improves extraction, but it has limits. For high-stakes work, very poor scans, or intricate tables and figures, plan on a human review pass over the output.
## More than scanned PDFs
The same one-step pipeline handles PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles. You can drop a mixed ZIP of scans and native files, and each source gets its own artifacts and entry in the manifest. If you build agents or automations, the agentic REST API mirrors the UI: `POST /v1/parse` to submit a job and `GET /v1/jobs/{id}` to poll it, with Bearer key authentication, idempotency keys, an OpenAPI 3.1 spec at `/openapi.json`, RFC 9457 problem+json errors, and agent docs at `/llms.txt`.
Your files stay in private per-user storage behind authenticated ownership checks, and downloads come through private signed links.
## FAQ
### OCR configuration
No. FileDigest detects scanned PDFs automatically and applies OCR as part of conversion. You upload the file and the engine decides what each page needs, so you do not set a flag or pick a mode.
### Using OCR output with AI systems
Yes. You get a clean `digest.md` for direct paste into an LLM, plus heading-contextualized RAG chunks for retrieval. A `manifest.json` with token estimates helps you fit content into a context window before sending it.
### Handling tables, formulas, and complex layouts
Turn on enrichments to convert formulas to LaTeX, preserve code, and describe pictures, or use the high-accuracy VLM tier for difficult scans and dense tables. For critical documents, review the Markdown against the original, which you can open side by side.
### OCR availability by plan
OCR and larger jobs are on the paid Pro and Business plans, which also raise token quotas. The Free plan lets you test the upload-to-digest workflow before committing to OCR-heavy processing.
## Preparing a PDF for ChatGPT (the Clean Way)
URL: https://filedigest.dev/pdf-to-chatgpt
Description: How to prepare a PDF for ChatGPT: convert it to clean, inspectable Markdown with FileDigest so the model reads structure, tables, and scanned text correctly.
The reliable way to prepare a PDF for ChatGPT is to convert it into clean Markdown first, so the model receives well-structured text instead of a raw binary that can scramble tables, headings, and scanned pages. FileDigest does this in one step: drop a PDF and it returns an inspectable `digest.md` plus per-source Markdown and RAG-ready chunks you can paste or attach.
## Why PDFs Need Prep Before ChatGPT
A PDF is a layout format, not a text format. When you hand one to ChatGPT directly, columns can interleave, table cells can collapse into run-on lines, and scanned (image-only) pages may carry no extractable text at all. The result is an assistant that answers from a distorted version of your document.
Preparing the file first solves this. By converting the PDF to structured Markdown, you give the model clear headings, intact tables, and ordered reading flow. You also get a copy you can actually read yourself, so you can confirm the AI is seeing what you think it is before you ever ask a question.
## How to Prepare a PDF With FileDigest
FileDigest is built around a one-step upload: drop, paste, or choose a file and processing starts automatically (there is no separate "process" button), then it routes you to a live job view. Behind the scenes it runs Docling on warm Modal L4 GPUs, where the converter and models load once per warm container, so repeat jobs return quickly.
For each source file you get a combined `digest.md`, a `manifest.json`, and per-source outputs in several formats: Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks. Every output is viewable side-by-side with the original PDF, so you can spot-check the conversion page by page before sending anything to ChatGPT.
For pasting into ChatGPT, the `digest.md` or per-source Markdown is usually what you want. For building a retrieval system or a custom GPT, the heading-contextualized RAG chunks are designed to drop straight into a vector store.
## Scanned PDFs, Formulas, and Mixed Documents
Scanned PDFs are detected automatically and OCR is applied, so image-only documents become real text rather than empty pages. Optional enrichments go further: formulas can be converted to LaTeX, code blocks recognized, and pictures described, with a high-accuracy VLM tier available for demanding documents.
FileDigest is not limited to PDFs. You can feed it DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles, which is useful when the context you want in ChatGPT is spread across a slide deck, a spreadsheet, and a few PDFs at once.
## Going Beyond Copy-Paste: the API
If you are wiring documents into an agent rather than chatting by hand, FileDigest exposes an agentic REST API. You POST to `/v1/parse` and poll `GET /v1/jobs/{id}`, using Bearer key authentication. The OpenAPI 3.1 spec lives at `/openapi.json`, idempotency keys prevent duplicate jobs, and errors follow the RFC 9457 problem+json format. Agent-oriented documentation is published at `/llms.txt`.
## Privacy of Your Documents
Uploaded files live in private per-user storage with authenticated ownership checks, and downloads are served through private signed links. Your documents are not shared across accounts. FileDigest offers Free, Pro, and Business plans, with OCR, larger jobs, and higher token quotas on the paid tiers.
## FAQ
### When to prepare a PDF versus uploading directly
For a quick read of a simple, text-based PDF, direct upload can be fine. Prepare it first when the document has tables, multiple columns, scanned pages, or when you want a reusable, inspectable Markdown copy that you can verify and reuse across multiple chats or tools.
### Output format for ChatGPT
Use the `digest.md` or the per-source Markdown. Markdown preserves headings and table structure in a way ChatGPT parses cleanly, and you can view it side-by-side with the original PDF to confirm accuracy first.
### Scanned PDF handling
Yes. FileDigest automatically detects scanned, image-only PDFs and applies OCR, so the text becomes usable in ChatGPT. Optional enrichments and a high-accuracy VLM tier are available for harder documents like dense forms or formula-heavy pages.
### Supported document formats
Yes. Alongside PDF, FileDigest accepts DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles, so you can prepare a mixed set of documents into one consistent context for ChatGPT.
## Preparing a PDF for Claude | FileDigest
URL: https://filedigest.dev/pdf-to-claude
Description: Turn any PDF into clean, inspectable Markdown context for Claude. FileDigest converts, OCRs, and chunks documents into AI-ready files in one step.
To prepare a PDF for Claude, convert it into clean Markdown before you prompt, so the model reads accurate text instead of raw page layout or scanned images. FileDigest does this in one step: upload a PDF and it returns a readable `digest.md`, structured outputs, and RAG-ready chunks you can paste into Claude, attach to a Claude Project, or feed an agent.
## Why convert a PDF instead of uploading it raw
Claude can accept files directly, and that is fine for a quick one-off question. But raw PDFs carry layout noise: columns get scrambled, tables flatten, headers and footers repeat, and scanned pages contain no selectable text at all. When the same document will be reused across many prompts, shared with teammates, or queried by an agent, it pays to prepare it once into clean, inspectable text.
FileDigest produces that prepared text and keeps it alongside the original so you can verify the conversion was faithful before you trust it in a prompt.
## One-step conversion to Claude-ready Markdown
Drop, paste, or choose a file and processing starts automatically. There is no separate "process" button, and the page routes straight to a live job view so you can watch the conversion run.
For every source file you get:
- a combined `digest.md` for readable AI context
- a `manifest.json` listing source files, outcomes, warnings, and token estimates
- per-source Markdown, HTML, Docling DocTags, and Docling JSON
- heading-contextualized RAG chunks for retrieval workflows
The outputs are viewable side by side with the original PDF, so you can confirm a table or figure caption survived the conversion before pasting it into Claude.
Inputs are not limited to PDF. FileDigest also accepts DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles, so a mixed packet becomes one consistent context pack.
## Scanned PDFs, formulas, and figures handled automatically
Scanned PDFs are detected automatically and OCR is applied, so image-only documents still produce real text. Optional enrichments turn formulas into LaTeX, capture code blocks, and generate picture descriptions, and a high-accuracy VLM tier is available when layout fidelity matters most.
Under the hood, conversion runs on Docling using warm Modal L4 GPUs. The converter and models load once per warm container, so repeat jobs run fast rather than paying cold-start cost every time.
## Built for agents and pipelines, not just copy-paste
If Claude is driving an automated workflow, you do not have to use the web UI at all. FileDigest exposes an agentic REST API: `POST /v1/parse` to submit a document and `GET /v1/jobs/{id}` to poll for results. It uses Bearer key auth, publishes an OpenAPI 3.1 spec at `/openapi.json`, supports idempotency keys for safe retries, and returns RFC 9457 problem+json errors. Agent-readable docs live at `/llms.txt`.
Everything stays private: per-user storage, authenticated ownership checks, and private signed downloads, so a prepared document is only accessible to you.
## FAQ
### Markdown versus PDF upload
For a one-off question, uploading the PDF directly is fastest. Paste the `digest.md` instead when the same document will be reused across prompts, shared, audited, or queried by an agent, because the prepared text is cleaner, repeatable, and downloadable.
### Scanned and image-only PDF support
Yes. Scanned PDFs are detected automatically and OCR is applied, so you get selectable, accurate text from image-only pages. OCR is available on the paid plans (Free, Pro, and Business, with OCR plus larger jobs and higher token quotas on Pro and Business).
### Organizing multiple files in Claude Projects
Upload your files (or a ZIP bundle) and FileDigest returns one combined `digest.md` plus a `manifest.json` with token estimates, so you can size the context for a Claude Project and attach a single clean pack instead of many loose PDFs.
### Integration with automated Claude agents
Yes. Use the REST API: `POST /v1/parse` and `GET /v1/jobs/{id}` with Bearer key auth, idempotency keys, and an OpenAPI 3.1 spec at `/openapi.json`. Agent-facing docs are at `/llms.txt`.
## PDF to Markdown for AI
URL: https://filedigest.dev/pdf-to-markdown-for-ai
Description: Convert PDFs into AI-ready Markdown digests and manifest.json files for ChatGPT, Claude, AI coding tools, and RAG workflows.
FileDigest turns PDFs into cleaner Markdown artifacts that are easier to inspect, paste into LLMs, store in prompt packets, or pass into downstream retrieval workflows.
## What it creates
Every successful job produces a readable `digest.md` and a structured `manifest.json`. The digest is for humans and LLM context windows. The manifest records file metadata, processing outcomes, pages, artifacts, and token estimates.
## Who it is for
FileDigest is useful for researchers, analysts, consultants, AI builders, and operations teams that regularly need to prepare dense PDFs before asking ChatGPT, Claude, AI coding tools, or a RAG pipeline to work with them.
## When to prepare a PDF versus uploading directly
Direct upload can be enough for one-off reading. FileDigest is for repeatable preparation: batch jobs, private artifact downloads, source manifests, token awareness, job history, and outputs that can be reused outside one chat session.
## Processing model
Document conversion runs on the Modal Docling engine. The browser never receives the Modal engine API key, and the production product does not use a local fallback processor.
## Privacy
URL: https://filedigest.dev/privacy
Description: FileDigest privacy overview.
Last updated: April 28, 2026.
FileDigest prepares user-uploaded documents for AI workflows. This notice explains what we collect, how we use it, and how to contact us about privacy or deletion requests.
## What we collect
FileDigest collects account information such as email address, authentication identifiers, subscription metadata, and basic operational logs.
When you create a document job, FileDigest stores metadata about the job, including file names, file sizes, MIME types, job status, processing options, token estimates, artifact metadata, timestamps, and error states.
Uploaded source files and generated artifacts are stored in private object storage paths associated with your user and job. Document processing runs through the Modal Docling engine.
FileDigest also collects privacy-safe analytics and attribution data, such as page views, CTA clicks, UTM parameters, ad click identifiers, landing paths, and referrer host names. We do not send document contents, file names, storage keys, email addresses, or raw referrer URLs in product analytics events.
## How we use data
We use account, billing, storage, and processing data to provide the FileDigest service, enforce plan limits, process documents, generate artifacts, troubleshoot failures, prevent abuse, and improve product reliability.
We use attribution and analytics data to understand which pages and campaigns lead to signup, successful document jobs, and checkout intent.
We do not sell uploaded documents. We do not intentionally train foundation models on user-uploaded files as part of the FileDigest service.
## Processors
FileDigest uses infrastructure providers for the application stack, including Vercel, Supabase, Modal, Stripe, and optional email, monitoring, and analytics providers. These providers process data only as needed to operate the service.
## Retention
Artifact retention depends on your plan. Free jobs use 72 hour artifact retention, Pro jobs use 30 day artifact retention, and Business jobs use 90 day artifact retention. Deleted jobs are designed to remove application metadata and associated private storage objects through the cleanup workflow.
## Security
The browser never receives the Modal engine API key. Downloads are served through authenticated ownership checks and short-lived signed URLs. Plan limits are enforced before expensive processing begins.
## Contact
For privacy or deletion requests, contact support@filedigest.dev.
## Private, Secure Document Processing for AI
URL: https://filedigest.dev/private-document-processing-ai
Description: FileDigest turns your files into AI-ready Markdown, RAG chunks, and structured manifests on private per-user storage with signed downloads and an agentic API.
FileDigest is private, secure document processing for AI: you drop, paste, or choose a file and it is converted into AI-ready context (Markdown, RAG chunks, and structured metadata) inside private per-user storage, with authenticated ownership checks and signed downloads so your source files and outputs stay yours. Processing starts automatically on upload and routes straight to a live job view, so there is no separate "process" button and no public exposure of your documents.
## How private processing works
Upload is one step. Drop, paste, or choose a file and the job starts immediately, then the page moves to a live view where you watch conversion happen. Behind that, FileDigest runs the Docling engine on warm Modal L4 GPUs. The converter and models load once per warm container, so repeat jobs run fast instead of paying a cold-start penalty every time.
Privacy is built into the storage layer, not added as an afterthought. Raw uploads and generated artifacts live in private, per-user storage. Every download passes an authenticated ownership check and is served through a private signed URL, so a link cannot be shared or guessed into someone else's documents.
## What you can feed it
FileDigest accepts the formats real document work actually produces: PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles. A ZIP lets you hand over a whole folder of mixed sources in a single upload instead of processing files one at a time.
Scanned PDFs are detected automatically and OCR is applied without a manual toggle, so image-only documents still come back as usable text. Optional enrichments go further when you need them: formulas converted to LaTeX, code extraction, picture descriptions, and a high-accuracy VLM tier for difficult or visually dense pages.
## What you get back
Each source produces a full set of artifacts rather than a single flattened text dump. Across a job you get a combined `digest.md` and a `manifest.json` that records what was processed and what was generated. Per source you get Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks, all viewable side by side with the original PDF so you can confirm the conversion is faithful before you trust it downstream.
The RAG chunks carry their heading context, which matters when you embed and retrieve: a chunk that knows which section it came from gives an AI model cleaner grounding than a raw, context-free slice of text.
## Built for agents and automation
Document processing for AI usually feeds something automated, so FileDigest exposes an agentic REST API. You POST to `/v1/parse` and poll `GET /v1/jobs/{id}`, authenticating with a Bearer key. The API publishes an OpenAPI 3.1 spec at `/openapi.json`, supports idempotency keys so retried requests do not create duplicate jobs, and returns errors as RFC 9457 problem+json so failures are machine-readable. Agent-focused documentation lives at `/llms.txt` for tools that read their own integration guides.
## FAQ
### Privacy in hosted environments
Both. Files are hosted so you do not run your own GPU pipeline, but they sit in private per-user storage. Downloads require an authenticated ownership check and are served through private signed URLs, so artifacts are not publicly listable or shareable by link alone.
### How processing starts
No. Upload is the trigger. The moment you drop, paste, or choose a file, the job starts and the page routes to a live job view where you can watch the conversion progress.
### Scanned and image-only PDF handling
FileDigest detects scanned PDFs automatically and applies OCR, so you do not have to flag them. For harder material you can enable optional enrichments (formulas to LaTeX, code, picture descriptions) and a high-accuracy VLM tier.
### Programmatic access and agent integration
Yes. Use the REST API: POST `/v1/parse`, then poll `GET /v1/jobs/{id}` with a Bearer key. There is an OpenAPI 3.1 spec at `/openapi.json`, idempotency keys for safe retries, RFC 9457 problem+json errors, and agent docs at `/llms.txt`. OCR, larger jobs, and higher token quotas are available on the Pro and Business plans, with a Free plan for small test jobs.
## RAG Document Ingestion Prep
URL: https://filedigest.dev/rag-document-ingestion
Description: Prepare document batches for RAG, evaluation, and agent workflows with Markdown digests and structured manifests.
RAG quality starts before embedding. FileDigest helps convert document batches into cleaner, auditable artifacts that can be reviewed before they are chunked, embedded, indexed, or given to an agent.
## Why a manifest matters
`manifest.json` gives downstream software a structured view of the job: which files were processed, which failed, what artifacts were generated, and how large the output is. That makes ingestion easier to test and debug than a folder of one-off conversions.
## Why Markdown matters
`digest.md` is portable. It can be pasted into an LLM, saved with a project packet, or passed to a controlled parser before vector indexing.
## Best fit
FileDigest is a good fit for AI builders who want a hosted Docling workflow with auth, billing, private storage, job history, and Modal processing instead of rebuilding the same conversion workflow for every document batch.
## Research Paper Digestion
URL: https://filedigest.dev/research-paper-digestion
Description: Prepare research papers and literature folders for AI-assisted review with Markdown digests and structured manifests.
Research workflows often start with dense PDFs, appendices, notes, and reading folders. FileDigest helps turn that material into a cleaner context pack before asking an AI system to summarize, compare, extract, or critique it.
## Research use cases
- literature review preparation
- paper comparison
- methods and theory extraction
- reading packets for seminars or projects
- AI-assisted note synthesis
## Why a digest helps
A digest gives you a single Markdown artifact with source boundaries. You can inspect it before using it with ChatGPT, Claude, AI coding tools, or another AI tool.
## Why a manifest helps
A manifest records what was processed, what failed, how large the output is, and what artifacts exist. That makes the workflow easier to audit than a one-off upload.
## Your data control
URL: https://filedigest.dev/security
Description: How FileDigest keeps your documents yours: per-user isolated storage, signed-download access control, automatic deletion, and an engine that cannot train on your files.
**Summary:** Your documents stay yours. They live in per-user isolated storage, every download passes an ownership check and a short-lived signed URL, files auto-delete on a retention clock you can shorten, and the converter is open-source and deterministic, so there is nothing for us to train on your data.
We are an independent product without compliance badges yet, so we do not wave SOC 2 or HIPAA logos around. Instead, here is exactly how your data is handled, with mechanisms you can verify.
## No model training on your documents
We do not run our own AI models. FileDigest converts documents with [Docling](https://github.com/docling-project/docling), a deterministic open-source engine. There is no proprietary model in the loop and therefore nothing to train on your files. Most document-AI tools run their own models and have to promise not to train on your data; we architecturally cannot.
## Per-user isolated storage
Every file and artifact is stored under a path scoped to your account. The download route checks that the job belongs to you and then issues a short-lived signed URL. Artifacts cannot be listed, guessed, or shared by link, and the engine API key never reaches the browser.
## Automatic deletion
Your files auto-delete on a retention clock:
- Free: 72 hours
- Pro: 30 days
- Business: 90 days
- Enterprise: custom, including zero-retention
Shorter or zero retention is available on request.
## Processing model
- You upload to a private storage path; the browser never holds the engine key.
- A server route registers the job and enforces your plan limits.
- Docling runs on Modal GPU workers and writes outputs under your job path.
- Downloads require an authenticated ownership check and a signed URL.
## Subprocessors
We use a small, named set of vendors: Modal (processing), Supabase (auth and storage), Stripe (billing), Resend (email), and Sentry (error monitoring). Credentials are server-side environment variables and are not shipped to client bundles.
## Self-host the same engine
Because the converter is open-source Docling, an air-gapped or regulated team can run the exact same engine in its own environment. Talk to us if you want a self-hosted or dedicated-region deployment.
## Enterprise
Custom DPA, SSO, dedicated or regional processing, custom or zero retention, and an SLA are available on the Enterprise plan. Email support@filedigest.dev to start.
## Subprocessors
URL: https://filedigest.dev/subprocessors
Description: Infrastructure and service providers used to operate FileDigest document preparation, billing, storage, processing, email, monitoring, and analytics.
FileDigest uses a small infrastructure stack to operate the product. These providers process data only as needed to provide the service.
Last updated: April 29, 2026.
## Core providers
| Provider | Purpose | Data involved |
| -------- | ------- | ------------- |
| Vercel | Web app hosting, serverless routes, analytics, deployment logs | Account/session metadata, request logs, public-site analytics |
| Supabase | Authentication, Postgres metadata, private object storage | Account metadata, job metadata, uploaded files, generated outputs |
| Modal | Document-processing engine | File contents during processing and generated extraction output |
| Stripe | Subscriptions, checkout, billing portal, invoices | Billing/customer metadata, payment status, invoices |
| Resend | Transactional email | Recipient email address and email content |
| Sentry | Error monitoring and release diagnostics | Error traces, environment metadata, non-sensitive operational context |
## Data categories
Depending on the workflow, subprocessors may process account metadata, billing metadata, job metadata, operational logs, uploaded source files, and generated outputs. FileDigest aims to keep document access scoped to the services required to process, store, monitor, bill, and deliver the product.
## Regions
FileDigest uses European-region infrastructure where it is configured and available, including Supabase project storage and database placement. Some providers may process operational metadata in standard global infrastructure, and Vercel/Modal execution region behavior can vary by provider routing and capacity. Contact support for region, DPA, or enterprise procurement questions.
## Contact
For subprocessor, DPA, or deletion questions, contact support@filedigest.dev.
## Terms & Conditions
URL: https://filedigest.dev/terms
Description: FileDigest terms and acceptable use overview.
Last updated: April 28, 2026.
These terms describe the operating rules for FileDigest accounts, uploads, processing, billing, and generated artifacts.
## Service
FileDigest is a document preparation SaaS that converts uploaded files into AI-ready artifacts such as Markdown digests and JSON manifests. Processing runs on the Modal Docling engine. The service is not a legal, medical, financial, or compliance adviser.
## Accounts
You are responsible for maintaining access to your account and for activity performed under it. You may only upload files that you have the right to process.
## Acceptable use
Do not use FileDigest to process illegal material, violate third-party rights, bypass security controls, attack the service, or upload content that you are not permitted to handle.
## Billing
Paid plans are billed through Stripe. Plan limits, OCR access, monthly quotas, retention, and pricing are shown on the pricing page and may change for future customers. Existing subscriptions are managed through the billing portal.
## Uploaded content
You keep ownership of your uploaded content. FileDigest stores and processes uploaded files only to provide the service, generate artifacts, enforce limits, and operate the platform.
## Availability
FileDigest depends on third-party infrastructure providers. We aim to keep the service reliable, but processing failures, provider outages, timeouts, and document parsing errors can occur.
## Limitation of liability
Use FileDigest outputs with human review. Document conversion can be incomplete or incorrect, especially for scanned files, complex layouts, tables, figures, or OCR-heavy material.
## Contact
Questions about these terms can be sent to support@filedigest.dev.
## Use cases
URL: https://filedigest.dev/use-cases
Description: Ways teams use FileDigest to turn messy documents into clean, AI-ready context, plus how it compares to pasting files into ChatGPT, Claude, or the Docling CLI.
FileDigest does one job: it compiles a folder of documents into one clean, source-labeled context pack for ChatGPT, Claude, RAG, and agents. Below are the most common ways people use it, and a few honest comparisons. Every page links back to the same product.
## By workflow
- [Consulting document packets](/consulting-document-packets): bundle a client's mixed files into one reviewable context pack.
- [Research paper digestion](/research-paper-digestion): turn papers into clean Markdown you can actually feed a model.
- [RAG document ingestion prep](/rag-document-ingestion): get heading-contextualized chunks ready to embed.
- [LLM context pack generator](/llm-context-pack-generator): produce one context file from many documents.
- [ZIP to LLM context pack](/zip-to-llm-context): drop a ZIP, get a single context pack back.
## By conversion
- [PDF to Markdown for AI](/pdf-to-markdown-for-ai)
- [Prepare a PDF for ChatGPT](/pdf-to-chatgpt)
- [Prepare a PDF for Claude](/pdf-to-claude)
- [DOCX to Markdown for ChatGPT](/docx-to-markdown-for-chatgpt)
- [OCR a scanned PDF to Markdown](/ocr-pdf-to-markdown)
## By tooling and format
- [Hosted Docling UI](/docling-ui)
- [Run a Docling workflow on Modal GPUs](/modal-docling-workflow)
- [manifest.json for processed documents](/manifest-json-document-processing)
- [Private, secure document processing](/private-document-processing-ai)
## How it compares
- [FileDigest vs ChatGPT file upload](/filedigest-vs-chatgpt-file-upload)
- [FileDigest vs Claude project knowledge](/filedigest-vs-claude-project-knowledge)
- [FileDigest vs Docling CLI](/filedigest-vs-docling-cli)
Want the short version instead? Read [how FileDigest works](/docs) or download a [real example packet](/examples).
## ZIP to LLM Context Pack
URL: https://filedigest.dev/zip-to-llm-context
Description: Turn a ZIP bundle of PDFs, DOCX, PPTX, notes, HTML, and Markdown into one AI-ready context pack.
FileDigest is built for folders, not only single-file demos. Upload a ZIP bundle and create a digest that brings scattered source files into one inspectable Markdown output.
## Use cases
Use ZIP-to-context processing for research folders, client packets, policy archives, proposal materials, manuals, notes, exports, and document bundles that need to become LLM-ready before analysis.
## Outputs
The main output is `digest.md`, a source-organized Markdown file designed for review and AI context. The companion `manifest.json` makes the job auditable by listing source files, file sizes, processing status, artifacts, and token estimates.
## Plan limits
Free jobs are designed for small tests. Pro and Business plans unlock larger batches, OCR, higher token quotas, and longer artifact retention.
# Documentation
## How FileDigest Works
URL: https://filedigest.dev/docs
Description: A user-facing overview of FileDigest document preparation and the pipeline that runs after you upload.
FileDigest turns supported source documents into AI-ready artifacts you can inspect before using them in ChatGPT, Claude, RAG prep, or analyst workflows.
The core output is a readable `digest.md` plus a structured `manifest.json`. The digest is for humans and LLM context windows. The manifest is for file-level review, metadata checks, and repeatable downstream workflows.
### Upload a document packet
Start with PDFs, DOCX, PPTX, TXT, Markdown, HTML, or a ZIP bundle containing supported files.
### Choose processing options
Select fast text extraction for normal jobs, accurate tables when structure matters more, or OCR when your plan includes scanned-document processing.
### Review the result
Open the completed job to inspect the digest, manifest, parsed files, warnings, failed files, and token estimates.
### Download private artifacts
Download `digest.md` and `manifest.json` through authenticated, short-lived links tied to your account.
FileDigest is intentionally narrow: it prepares source documents for AI use. It is not a chat app, not a public file host, and not a replacement for human review.
## What happens after you upload
FileDigest separates upload, validation, processing, and artifact download so each step is visible.
### Create a job
The workbench checks file count, job size, estimated output tokens, OCR access, and monthly quota.
### Upload privately
Your browser uploads selected files to private storage paths assigned to your job.
### Register files
After upload, FileDigest confirms the files exist and prepares the packet for processing.
### Generate artifacts
The processing engine converts supported inputs into `digest.md` and `manifest.json`.
### Review the job
The job page shows status, warnings, failed files, digest preview, manifest preview, and private downloads.
Ready to try it? Follow [Create your first digest](/docs/first-digest), or call the same pipeline from code with the [API](/docs/api).
## API Reference
URL: https://filedigest.dev/docs/api
Description: Parse any document into AI-ready context from your own code or an agent. One endpoint to submit, one to poll.
FileDigest ships a small public API so an agent or a script can do exactly what the dashboard does: send a file, get back clean Markdown, structured per-source representations, and RAG chunks. There are two endpoints, both authenticated with a Bearer key.
## Authentication
Every request needs an API key. Create one in your dashboard under [FileDigest Settings](/dashboard/filedigest/settings), then send it as a Bearer token:
```bash
Authorization: Bearer fd_live_...
```
Keys are tied to your account and your plan limits. Calls without a valid key return `401`.
## Submit a parse job
`POST /v1/parse` accepts either a multipart `file` or a JSON `{ source_url }`. It hides the create, upload, register, and process steps and returns `202` with a job id to poll.
```bash
curl -X POST https://filedigest.dev/v1/parse \
-H "Authorization: Bearer fd_live_..." \
-F "file=@report.pdf" \
-F "mode=accurate_tables"
```
To parse a file by URL instead of uploading bytes:
```bash
curl -X POST https://filedigest.dev/v1/parse \
-H "Authorization: Bearer fd_live_..." \
-H "Content-Type: application/json" \
-d '{ "source_url": "https://example.com/report.pdf", "ocr": true }'
```
Response:
```json
{ "job_id": "abc123", "status": "accepted", "poll": "/v1/jobs/abc123" }
```
### Options
| Field | Values | What it does |
| --- | --- | --- |
| `mode` | `fast_text`, `accurate_tables` | Extraction strategy. Use accurate tables when structure matters. |
| `ocr` | `true`, `false` | Run OCR on scanned or image-only pages (requires a plan with OCR). |
| `quality` | `standard`, `high` | High uses the VLM pipeline for hard layouts (slower). |
| `enrich_formulas` | `true`, `false` | Convert math to LaTeX (slower). |
| `enrich_code` | `true`, `false` | Detect code blocks and language (slower). |
| `describe_pictures` | `true`, `false` | Generate image captions (slower, VLM). |
### Idempotency
Send an `Idempotency-Key` header to make retries safe. Replaying the same key returns the original job instead of creating a duplicate.
The file size limit is 100MB per API request. Over-limit, quota, and engine errors come back as RFC 9457 problem details with a `code` field (for example `QUOTA_EXCEEDED`, `FILE_TOO_LARGE`, `MODAL_UNAVAILABLE`).
## Poll for the result
`GET /v1/jobs/{id}` returns the current status. While the job is `pending` or `processing`, poll until it reaches `completed` or `failed`.
```bash
curl https://filedigest.dev/v1/jobs/abc123 \
-H "Authorization: Bearer fd_live_..."
```
A completed job carries the result inline:
```json
{
"job_id": "abc123",
"status": "completed",
"result": {
"tokens": 24017,
"parsed_files": 7,
"failed_files": 0,
"digest": "# report.pdf\n...AI-ready Markdown...",
"manifest": { }
}
}
```
## Output
The `result` block holds everything you need downstream:
- `digest`: the combined, source-organized Markdown context pack (the same `digest.md` you download in the app).
- `manifest`: structured run metadata plus, for each source, a `representations` block with `markdown`, `html`, `doctags`, `docling_json`, and heading-contextualized `chunks` ready to embed.
- `tokens`, `parsed_files`, `failed_files`: counts for the run.
The dashboard also exposes the matching `provenance.json` for source URLs, hashes, and job provenance. See the [Examples](/examples) page for a real packet you can download.
## Machine-readable contract and agent files
- [OpenAPI 3.1 spec](/openapi.json): the full machine contract for both endpoints.
- [llms.txt](/llms.txt): a short agent-discovery file describing the product and its endpoints.
- [llms-full.txt](/llms-full.txt): the expanded agent-discovery file.
These let an agent discover and call FileDigest without reading this page first.
## Dashboard Guide
URL: https://filedigest.dev/docs/dashboard-guide
Description: Main FileDigest dashboard areas and what each one is for.
## FileDigest
Create new jobs, upload source files, choose OCR or table settings, and start processing.
## Job detail
Review the current status, output preview, manifest preview, file list, warnings, and private downloads for one job.
## Usage
Check monthly output-token usage and recent processing activity.
## Billing
Review the active plan and open billing management for paid subscriptions.
## Settings
Review plan limits, retention, OCR availability, and account settings.
## Create Your First Digest
URL: https://filedigest.dev/docs/first-digest
Description: How to upload files and produce your first FileDigest output.
### Open the workbench
Sign in and open the FileDigest workbench from the dashboard.
### Choose files
Upload PDFs, DOCX, PPTX, TXT, Markdown, HTML, or a ZIP bundle. The page shows your current plan limits before processing starts.
### Select options
Use fast extraction for most jobs. Choose accurate tables for structure-heavy files. Turn on OCR only when your plan supports it and the file needs image-based text recognition.
### Start processing
FileDigest uploads files to private storage, registers the job, and starts secure document processing.
### Inspect and download
When the job finishes, review `digest.md`, inspect `manifest.json`, copy the digest, or download the private artifacts.
If a job fails, check the file list and warnings first. Most failures come from unsupported file types, oversized jobs, password-protected files, or scans that need OCR.
## Login And Email
URL: https://filedigest.dev/docs/login-email
Description: Account access, email sign-in, and support contact guidance.
FileDigest uses email-based account access so your jobs, plan, and private artifacts stay tied to your user account.
### Sign in
Use the sign-in button and enter the email address you want associated with your FileDigest work.
### Confirm access
Follow the sign-in email from the same browser when possible. If a link expires, request a fresh one.
### Open your dashboard
After sign-in, the dashboard opens the FileDigest workbench and your job history.
### Contact support
Use `support@filedigest.dev` for account access, billing, failed jobs, or retention questions.
## Options And Limits
URL: https://filedigest.dev/docs/options-limits
Description: Processing choices, plan limits, and retention behavior.
FileDigest checks your plan before processing starts.
## Processing options
- Fast text extraction is the default for clean digital PDFs, DOCX, PPTX, text, Markdown, and HTML.
- Accurate tables is useful when table structure matters more than speed.
- OCR is available on paid plans for scanned or image-heavy PDFs.
## Plan limits
| Plan | Files per job | Job size | Monthly output tokens | Retention |
|---|---:|---:|---:|---:|
| Free | 25 | 100 MB | 2M | 72 hours |
| Pro | 100 | 1 GB | 100M | 30 days |
| Business | 250 | 2 GB | 500M | 90 days |
Output token estimates are safeguards. They help prevent a packet from exceeding your monthly quota or producing an artifact too large for practical AI use.
## Retention
Artifacts are retained according to plan. Download important digests before the retention window closes.
## Plans And Billing
URL: https://filedigest.dev/docs/plans-billing
Description: FileDigest plan limits, billing behavior, and subscription changes.
FileDigest plans control file count, job size, OCR access, monthly output tokens, and artifact retention.
| Plan | Price | Core limits |
|---|---:|---|
| Free | $0 | 25 files/job, 100 MB/job, OCR off, 2M output tokens/month, 72 hour retention |
| Pro | $15/month or $144/year | 100 files/job, 1 GB/job, OCR on, 100M output tokens/month, 30 day retention |
| Business | $39/month or $390/year | 250 files/job, 2 GB/job, OCR on, 500M output tokens/month, 90 day retention |
## Upgrades
Choose a paid plan from pricing when you need larger packets, OCR, more monthly output tokens, or longer retention.
## Billing management
Paid users manage plan changes, invoices, cancellation, and renewal details from the billing page.
## Custom needs
Email `support@filedigest.dev` for retention, volume, team workflow, or API roadmap questions.
## Supported Files
URL: https://filedigest.dev/docs/supported-files
Description: File types, bundles, and outputs supported by FileDigest.
## Inputs
Primary formats are PDF, DOCX, and PPTX. FileDigest also accepts TXT, Markdown, HTML, HTM, and ZIP bundles containing supported files.
ZIP bundles are useful for packets that belong together: a paper plus appendix, a client packet plus notes, or a policy document plus supporting text.
## Outputs
Every successful job is built around two artifacts:
- `digest.md`: readable Markdown designed for AI context windows and human review.
- `manifest.json`: structured metadata for file status, artifact status, sizes, warnings, and token estimates.
## What to avoid
Avoid password-protected documents, unsupported binaries, huge media files, and ZIP bundles that mostly contain unsupported formats.
For scans or image-heavy PDFs, use OCR on a paid plan.
## Troubleshooting
URL: https://filedigest.dev/docs/troubleshooting
Description: Common FileDigest job issues and what to check first.
### Unsupported file
Use PDF, DOCX, TXT, Markdown, HTML, HTM, or ZIP bundles. Files inside a ZIP must also use supported extensions.
### Job too large
Reduce the number of files, split the ZIP bundle, or upgrade if your packet exceeds your plan's file, size, or token limits.
### OCR needed
Image-heavy scans may produce poor text until OCR is enabled on a paid plan.
### Partial output
A partial job can still produce a useful digest. Check the file tab for failed files and warnings before deciding whether to re-run the packet.
### Download unavailable
Downloads require the signed-in owner of the job. If artifacts expired under your plan's retention window, create a new job.
# Articles
## What an AI-Ready Document Context Pack Contains
URL: https://filedigest.dev/blog/ai-ready-document-context-packs
Description: A FileDigest job turns source files into a digest, manifest, metadata, and private downloads that can be reused in AI workflows.
An AI-ready context pack is not just a converted PDF. It is a small bundle of artifacts that a human can inspect before using it in an LLM, RAG system, or agent workflow.
FileDigest focuses on two core outputs first.
## `digest.md`
The digest is the readable Markdown output. It gives the user a cleaner source representation that can be copied into an LLM, stored in a prompt packet, or passed to downstream indexing.
## `manifest.json`
The manifest is the structured audit layer. It records job status, file metadata, processing outcomes, artifact types, token estimates, and storage references.
## Why both are needed
Markdown is useful for humans and LLM context. The manifest is useful for software, QA, reproducibility, and billing. Together, they make document preparation less fragile than a one-off conversion script.
## From Upload To Digest
URL: https://filedigest.dev/blog/from-upload-to-digest
Description: A short walkthrough of the FileDigest flow from private upload to signed downloads.
The FileDigest user flow is intentionally direct.
1. A signed-in user opens the FileDigest workbench.
2. The browser uploads source files to a private account-owned storage path.
3. FileDigest creates a job record and validates plan limits.
4. Secure Docling processing converts the uploaded objects.
5. FileDigest writes generated artifacts for the owner of the job.
6. The dashboard polls status and updates the job detail page.
7. The user previews the digest and downloads private artifacts.
The goal is not to add an AI chat surface before the document pipeline is trustworthy. The first job is to make source material clean, inspectable, and ready for the AI tools users already use.
## Why FileDigest Keeps Processing Separate
URL: https://filedigest.dev/blog/modal-docling-document-pipeline
Description: FileDigest keeps document conversion separate from the browser so outputs remain private, inspectable, and controlled.
FileDigest is built around a strict processing rule: uploaded documents are not converted in the browser, and the production app does not use a hidden local fallback worker.
The app handles identity, billing, plan limits, job metadata, and private artifact access. A secure Docling processing layer handles document conversion. That separation keeps heavy processing behind server-side controls and makes the user flow easier to trust.
## The processing contract
The browser uploads files to private paths and never receives internal processing credentials.
The processor accepts registered files, converts them with Docling, and writes output artifacts back to private paths:
- `digest.md`
- `manifest.json`
- optional bundles and logs
## Why this matters
Document prep is often a cost and reliability problem before it is an AI problem. FileDigest blocks oversized jobs before compute starts, tracks usage, and keeps downloads private through signed artifact access.
That makes the first paid product simple: prepare source material for AI, prove the output is useful, and keep the processing path operationally honest.