Private, Secure Document Processing for AI

FileDigest turns your files into AI-ready Markdown, RAG chunks, and structured manifests on private per-user storage with signed downloads and an agentic API.


FileDigest is private, secure document processing for AI: you drop, paste, or choose a file and it is converted into AI-ready context (Markdown, RAG chunks, and structured metadata) inside private per-user storage, with authenticated ownership checks and signed downloads so your source files and outputs stay yours. Processing starts automatically on upload and routes straight to a live job view, so there is no separate "process" button and no public exposure of your documents.

How private processing works

Upload is one step. Drop, paste, or choose a file and the job starts immediately, then the page moves to a live view where you watch conversion happen. Behind that, FileDigest runs the Docling engine on warm Modal L4 GPUs. The converter and models load once per warm container, so repeat jobs run fast instead of paying a cold-start penalty every time.

Privacy is built into the storage layer, not bolted on. Raw uploads and generated artifacts live in private, per-user storage. Every download passes an authenticated ownership check and is served through a private signed URL, so a link cannot be shared or guessed into someone else's documents.

What you can feed it

FileDigest accepts the formats real document work actually produces: PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles. A ZIP lets you hand over a whole folder of mixed sources in a single upload instead of processing files one at a time.

Scanned PDFs are detected automatically and OCR is applied without a manual toggle, so image-only documents still come back as usable text. Optional enrichments go further when you need them: formulas converted to LaTeX, code extraction, picture descriptions, and a high-accuracy VLM tier for difficult or visually dense pages.

What you get back

Each source produces a full set of artifacts rather than a single flattened text dump. Across a job you get a combined digest.md and a manifest.json that records what was processed and what was generated. Per source you get Markdown, HTML, Docling DocTags, Docling JSON, and heading-contextualized RAG chunks, all viewable side by side with the original PDF so you can confirm the conversion is faithful before you trust it downstream.

The RAG chunks carry their heading context, which matters when you embed and retrieve: a chunk that knows which section it came from gives an AI model cleaner grounding than a raw, context-free slice of text.

Built for agents and automation

Document processing for AI usually feeds something automated, so FileDigest exposes an agentic REST API. You POST to /v1/parse and poll GET /v1/jobs/{id}, authenticating with a Bearer key. The API publishes an OpenAPI 3.1 spec at /openapi.json, supports idempotency keys so retried requests do not create duplicate jobs, and returns errors as RFC 9457 problem+json so failures are machine-readable. Agent-focused documentation lives at /llms.txt for tools that read their own integration guides.

FAQ

Is FileDigest actually private, or just hosted?

Both. Files are hosted so you do not run your own GPU pipeline, but they sit in private per-user storage. Downloads require an authenticated ownership check and are served through private signed URLs, so artifacts are not publicly listable or shareable by link alone.

Do I have to click a button to start processing?

No. Upload is the trigger. The moment you drop, paste, or choose a file, the job starts and the page routes to a live job view where you can watch the conversion progress.

How does it handle scanned or image-only PDFs?

FileDigest detects scanned PDFs automatically and applies OCR, so you do not have to flag them. For harder material you can enable optional enrichments (formulas to LaTeX, code, picture descriptions) and a high-accuracy VLM tier.

Can I run it from my own code or an agent?

Yes. Use the REST API: POST /v1/parse, then poll GET /v1/jobs/{id} with a Bearer key. There is an OpenAPI 3.1 spec at /openapi.json, idempotency keys for safe retries, RFC 9457 problem+json errors, and agent docs at /llms.txt. OCR, larger jobs, and higher token quotas are available on the Pro and Business plans, with a Free plan for small test jobs.