# FileDigest

> FileDigest compiles a folder of messy documents (PDF, DOCX, PPTX, XLSX, scans, HTML, Markdown, or a ZIP) into one clean, source-labeled digest.md plus a manifest.json, with per-source Markdown, HTML, Docling DocTags, JSON, and heading-aware RAG chunks and token counts, ready for ChatGPT, Claude, Gemini, Cursor, RAG, and AI agents.

FileDigest is a freemium document-preparation SaaS (Free, Pro, Enterprise). The conversion engine is Docling, the open-source document converter from IBM Research and an LF AI and Data project, run on Modal GPUs; it preserves tables, headings, and reading order, automatically OCRs scanned PDFs, and uses no language model to rewrite text, so nothing is trained on user documents. Users upload files in one step (processing starts automatically), verify the original against the conversion in a side-by-side viewer, and download private artifacts tied to their account. The same pipeline is available to agents through a /v1 REST API.

## Core Product

- [Homepage](https://filedigest.dev/): AI-ready document digestion with digest.md and manifest.json outputs.
- [Pricing](https://filedigest.dev/pricing): Free and Pro limits, OCR access, token quotas, and retention.
- [Help Center](https://filedigest.dev/docs): How uploads, processing options, supported files, billing, and troubleshooting work.
- [Full AI-readable context](https://filedigest.dev/llms-full.txt): Larger public context bundle generated from FileDigest pages, docs, and articles.
- [Privacy](https://filedigest.dev/privacy): Data, storage, retention, and processor overview.
- [Terms](https://filedigest.dev/terms): Account, upload, billing, and acceptable-use terms.

## API (for AI agents)

FileDigest exposes a small REST API so AI agents can parse documents directly.

- Auth: `Authorization: Bearer fd_live_...` (create keys in FileDigest Settings).
- OpenAPI 3.1 spec: https://filedigest.dev/openapi.json
- POST https://filedigest.dev/v1/parse - submit a document as multipart `file` or JSON `{ "source_url": "..." }`; returns 202 `{ job_id, poll }`. Send an `Idempotency-Key` header so retries never create duplicates.
- GET https://filedigest.dev/v1/jobs/%7Bid%7D - poll the job; when `status` is `completed` it returns the AI-ready Markdown `digest` plus a `manifest` with per-source markdown, html, Docling doctags, and docling_json.
- Errors are RFC 9457 application/problem+json with stable codes (UNAUTHORIZED, QUOTA_EXCEEDED, VALIDATION_FAILED, MODAL_UNAVAILABLE).

## Best Public Pages For AI Answers

- [AI Document Processing Self-Test](https://filedigest.dev/ai-document-processing-benchmark): A practical self-test checklist for PDF, DOCX, ZIP, OCR, Markdown, and manifest quality before scaling FileDigest usage.
- [Consulting Document Packets](https://filedigest.dev/consulting-document-packets): Turn client packets, reports, decks, notes, and policy documents into AI-ready Markdown context packs.
- [Hosted Docling UI & Web Interface for Doc Conversion](https://filedigest.dev/docling-ui): FileDigest is a hosted Docling UI: upload a PDF, DOCX, or image and get Markdown, JSON, DocTags, and RAG chunks from Docling on warm Modal L4 GPUs.
- [DOCX to Markdown for ChatGPT | FileDigest](https://filedigest.dev/docx-to-markdown-for-chatgpt): Convert DOCX files into clean, AI-ready Markdown for ChatGPT. Upload to FileDigest, get a digest.md plus RAG chunks, manifest, and side-by-side review.
- [FileDigest Examples (Real Output Packet)](https://filedigest.dev/examples): Download a reproducible public demo packet: the original source files plus the real FileDigest outputs, including per-source Markdown, HTML, Docling DocTags, JSON, and RAG chunks.
- [FileDigest vs ChatGPT File Upload](https://filedigest.dev/filedigest-vs-chatgpt-file-upload): When to use ChatGPT file upload directly and when to prepare documents first with FileDigest.
- [FileDigest vs Claude Project Knowledge](https://filedigest.dev/filedigest-vs-claude-project-knowledge): Compare Claude project knowledge with FileDigest document preparation for reusable Markdown context packs.
- [FileDigest vs Docling CLI](https://filedigest.dev/filedigest-vs-docling-cli): Compare a hosted FileDigest workflow with running Docling directly from the command line.
- [LLM Context Pack Generator from Documents](https://filedigest.dev/llm-context-pack-generator): Turn PDFs, Office files, and scans into an LLM-ready context pack: a Markdown digest, manifest, and RAG chunks built on Docling and warm GPUs.
- [manifest.json for Processed Documents | FileDigest](https://filedigest.dev/manifest-json-document-processing): A manifest.json describing processed documents is a structured index of every source, output artifact, and job outcome. See what FileDigest writes and why.
- [Run a Docling Workflow on Modal GPUs | FileDigest](https://filedigest.dev/modal-docling-workflow): Run a Docling document workflow on warm Modal L4 GPUs without owning infrastructure. Upload a file and get AI-ready Markdown, RAG chunks, and a REST API.
- [OCR a Scanned PDF to Markdown for AI | FileDigest](https://filedigest.dev/ocr-pdf-to-markdown): OCR scanned PDFs into clean, AI-ready Markdown. FileDigest auto-detects scans, runs Docling OCR on warm GPUs, and outputs digest.md plus RAG chunks.
- [Preparing a PDF for ChatGPT (the Clean Way)](https://filedigest.dev/pdf-to-chatgpt): How to prepare a PDF for ChatGPT: convert it to clean, inspectable Markdown with FileDigest so the model reads structure, tables, and scanned text correctly.
- [Preparing a PDF for Claude | FileDigest](https://filedigest.dev/pdf-to-claude): Turn any PDF into clean, inspectable Markdown context for Claude. FileDigest converts, OCRs, and chunks documents into AI-ready files in one step.
- [PDF to Markdown for AI](https://filedigest.dev/pdf-to-markdown-for-ai): Convert PDFs into AI-ready Markdown digests and manifest.json files for ChatGPT, Claude, AI coding tools, and RAG workflows.
- [Private, Secure Document Processing for AI](https://filedigest.dev/private-document-processing-ai): FileDigest turns your files into AI-ready Markdown, RAG chunks, and structured manifests on private per-user storage with signed downloads and an agentic API.
- [RAG Document Ingestion Prep](https://filedigest.dev/rag-document-ingestion): Prepare document batches for RAG, evaluation, and agent workflows with Markdown digests and structured manifests.
- [Research Paper Digestion](https://filedigest.dev/research-paper-digestion): Prepare research papers and literature folders for AI-assisted review with Markdown digests and structured manifests.
- [Your data control](https://filedigest.dev/security): How FileDigest keeps your documents yours: per-user isolated storage, signed-download access control, automatic deletion, and an engine that cannot train on your files.
- [Subprocessors](https://filedigest.dev/subprocessors): Infrastructure and service providers used to operate FileDigest document preparation, billing, storage, processing, email, monitoring, and analytics.
- [Use cases](https://filedigest.dev/use-cases): Ways teams use FileDigest to turn messy documents into clean, AI-ready context, plus how it compares to pasting files into ChatGPT, Claude, or the Docling CLI.
- [ZIP to LLM Context Pack](https://filedigest.dev/zip-to-llm-context): Turn a ZIP bundle of PDFs, DOCX, PPTX, notes, HTML, and Markdown into one AI-ready context pack.

## Documentation

- [API Reference](https://filedigest.dev/docs/api): Parse any document into AI-ready context from your own code or an agent. One endpoint to submit, one to poll.
- [Dashboard Guide](https://filedigest.dev/docs/dashboard-guide): Main FileDigest dashboard areas and what each one is for.
- [Create Your First Digest](https://filedigest.dev/docs/first-digest): How to upload files and produce your first FileDigest output.
- [How FileDigest Works](https://filedigest.dev/docs): A user-facing overview of FileDigest document preparation and the pipeline that runs after you upload.
- [Login And Email](https://filedigest.dev/docs/login-email): Account access, email sign-in, and support contact guidance.
- [Options And Limits](https://filedigest.dev/docs/options-limits): Processing choices, plan limits, and retention behavior.
- [Plans And Billing](https://filedigest.dev/docs/plans-billing): FileDigest plan limits, billing behavior, and subscription changes.
- [Supported Files](https://filedigest.dev/docs/supported-files): File types, bundles, and outputs supported by FileDigest.
- [Troubleshooting](https://filedigest.dev/docs/troubleshooting): Common FileDigest job issues and what to check first.

## Articles

- [What an AI-Ready Document Context Pack Contains](https://filedigest.dev/blog/ai-ready-document-context-packs): A FileDigest job turns source files into a digest, manifest, metadata, and private downloads that can be reused in AI workflows.
- [From Upload To Digest](https://filedigest.dev/blog/from-upload-to-digest): A short walkthrough of the FileDigest flow from private upload to signed downloads.
- [Why FileDigest Keeps Processing Separate](https://filedigest.dev/blog/modal-docling-document-pipeline): FileDigest keeps document conversion separate from the browser so outputs remain private, inspectable, and controlled.

## Important Facts

- Supported inputs: PDF, DOCX, PPTX, XLSX, images (PNG/JPG), TXT, Markdown, HTML, CSV, and ZIP bundles.
- Core outputs: digest.md and manifest.json, plus per-source Markdown, HTML, Docling DocTags, and Docling JSON viewable side-by-side with the original.
- Upload is one step: dropping, pasting, or choosing files starts processing automatically (no separate process button).
- Processing backend: secure Docling document-conversion engine on warm GPU workers.
- Storage model: private object paths owned by the signed-in user and job.
- Product records: account, job, file, artifact, billing, and usage metadata.
- Billing: Free and Pro plan gates with subscription management.
- Security: dashboard artifacts require authenticated ownership checks and private signed downloads.
- Product category: document preparation for AI, PDF to Markdown for AI, RAG document preprocessing, LLM context packs, hosted Docling workflow.