Preparing a PDF for Claude | FileDigest
Turn any PDF into clean, inspectable Markdown context for Claude. FileDigest converts, OCRs, and chunks documents into AI-ready files in one step.
To prepare a PDF for Claude, convert it into clean Markdown before you prompt, so the model reads accurate text instead of raw page layout or scanned images. FileDigest does this in one step: upload a PDF and it returns a readable digest.md, structured outputs, and RAG-ready chunks you can paste into Claude, attach to a Claude Project, or feed an agent.
Why convert a PDF instead of uploading it raw
Claude can accept files directly, and that is fine for a quick one-off question. But raw PDFs carry layout noise: columns get scrambled, tables flatten, headers and footers repeat, and scanned pages contain no selectable text at all. When the same document will be reused across many prompts, shared with teammates, or queried by an agent, it pays to prepare it once into clean, inspectable text.
FileDigest produces that prepared text and keeps it alongside the original so you can verify the conversion was faithful before you trust it in a prompt.
One-step conversion to Claude-ready Markdown
Drop, paste, or choose a file and processing starts automatically. There is no separate "process" button, and the page routes straight to a live job view so you can watch the conversion run.
For every source file you get:
- a combined
digest.mdfor readable AI context - a
manifest.jsonlisting source files, outcomes, warnings, and token estimates - per-source Markdown, HTML, Docling DocTags, and Docling JSON
- heading-contextualized RAG chunks for retrieval workflows
The outputs are viewable side by side with the original PDF, so you can confirm a table or figure caption survived the conversion before pasting it into Claude.
Inputs are not limited to PDF. FileDigest also accepts DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles, so a mixed packet becomes one consistent context pack.
Scanned PDFs, formulas, and figures handled automatically
Scanned PDFs are detected automatically and OCR is applied, so image-only documents still produce real text. Optional enrichments turn formulas into LaTeX, capture code blocks, and generate picture descriptions, and a high-accuracy VLM tier is available when layout fidelity matters most.
Under the hood, conversion runs on Docling using warm Modal L4 GPUs. The converter and models load once per warm container, so repeat jobs run fast rather than paying cold-start cost every time.
Built for agents and pipelines, not just copy-paste
If Claude is driving an automated workflow, you do not have to use the web UI at all. FileDigest exposes an agentic REST API: POST /v1/parse to submit a document and GET /v1/jobs/{id} to poll for results. It uses Bearer key auth, publishes an OpenAPI 3.1 spec at /openapi.json, supports idempotency keys for safe retries, and returns RFC 9457 problem+json errors. Agent-readable docs live at /llms.txt.
Everything stays private: per-user storage, authenticated ownership checks, and private signed downloads, so a prepared document is only accessible to you.
FAQ
Markdown versus PDF upload
For a one-off question, uploading the PDF directly is fastest. Paste the digest.md instead when the same document will be reused across prompts, shared, audited, or queried by an agent, because the prepared text is cleaner, repeatable, and downloadable.
Scanned and image-only PDF support
Yes. Scanned PDFs are detected automatically and OCR is applied, so you get selectable, accurate text from image-only pages. OCR is available on the paid plans (Free, Pro, and Business, with OCR plus larger jobs and higher token quotas on Pro and Business).
Organizing multiple files in Claude Projects
Upload your files (or a ZIP bundle) and FileDigest returns one combined digest.md plus a manifest.json with token estimates, so you can size the context for a Claude Project and attach a single clean pack instead of many loose PDFs.
Integration with automated Claude agents
Yes. Use the REST API: POST /v1/parse and GET /v1/jobs/{id} with Bearer key auth, idempotency keys, and an OpenAPI 3.1 spec at /openapi.json. Agent-facing docs are at /llms.txt.