Run a Docling Workflow on Modal GPUs | FileDigest

Run a Docling document workflow on warm Modal L4 GPUs without owning infrastructure. Upload a file and get AI-ready Markdown, RAG chunks, and a REST API.


FileDigest runs a managed Docling workflow on warm Modal L4 GPUs, so you upload a document and get back AI-ready outputs (Markdown, RAG chunks, Docling JSON, and more) without provisioning, scaling, or maintaining any GPU infrastructure yourself.

What "running Docling on Modal GPUs" means here

Docling is the open document conversion engine that turns files like PDFs and slide decks into structured, machine-readable representations. Modal is a serverless GPU platform. FileDigest combines the two: Docling runs inside warm Modal containers backed by L4 GPUs, and the converter plus its models load once per warm container. The practical result is that repeat jobs land on an already-initialized container and run fast, because you are not paying the model-load cost on every request.

You do not write any deployment code, manage container lifecycles, or pick GPU types. FileDigest wraps the whole pipeline (authentication, plan limits, private storage, billing, job history, and downloads) around the Docling-on-Modal engine.

How the workflow runs, step by step

The workflow is one step to start. Drop a file, paste it, or choose it from your device, and processing begins automatically. There is no separate "process" button to click. You are then routed straight to a live job view where you can watch the conversion progress.

Supported inputs cover the formats most teams actually have: PDF, DOCX, PPTX, XLSX, images, TXT, Markdown, HTML, CSV, and ZIP bundles for batches of files.

Scanned PDFs are detected automatically and OCR is applied without you toggling anything. You can also turn on optional enrichments (converting formulas to LaTeX, extracting code, and generating picture descriptions), plus a high-accuracy VLM tier when you need maximum fidelity on complex layouts.

What you get back from each job

For every source file, FileDigest produces a consistent set of outputs:

  • A combined digest.md across the job and a manifest.json describing what was produced.
  • Per-source Markdown and HTML.
  • Docling DocTags and Docling JSON for structured downstream use.
  • Heading-contextualized RAG chunks, ready to embed and feed into a retrieval pipeline.

You can view these outputs side by side with the original PDF, so it is easy to confirm that tables, headings, and figures were captured correctly before you trust the extraction.

Driving the same workflow from code or an agent

The hosted Docling-on-Modal workflow is also a programmable API, which matters if you want an agent or backend service to run conversions on your behalf. You send a POST to /v1/parse to submit a job and poll GET /v1/jobs/{id} to retrieve status and results. Authentication uses a Bearer API key.

The API is built for automation: there is an OpenAPI 3.1 spec at /openapi.json, support for idempotency keys so retries do not create duplicate jobs, and errors returned as RFC 9457 problem+json so failures are machine-readable. Agent-focused documentation lives at /llms.txt.

Your documents stay private throughout. Storage is per-user, ownership is verified on authenticated checks, and downloads are served through private signed links rather than public URLs.

FAQ

Do I need a Modal account or GPU setup to run Docling?

No. FileDigest operates the Modal GPU infrastructure for you. You sign in, upload a file, and the Docling workflow runs on warm Modal L4 GPUs behind the scenes. There is nothing to deploy, scale, or maintain on your side.

Why does FileDigest run Docling on warm GPUs?

Document conversion models are expensive to load. By keeping containers warm, FileDigest loads the Docling converter and its models once per container, so subsequent jobs that hit a warm container skip that startup cost and finish faster.

Can it handle scanned documents and complex layouts?

Yes. Scanned PDFs are detected automatically and OCR is applied. For documents with formulas, code, or images, you can enable optional enrichments (formulas to LaTeX, code extraction, picture descriptions), and a high-accuracy VLM tier is available for the hardest layouts. Note that OCR and larger jobs are part of the paid Pro and Business plans.

How do I run a Docling workflow programmatically?

Send a POST /v1/parse request with a Bearer API key to start a job, then poll GET /v1/jobs/{id} for results. The OpenAPI 3.1 spec is at /openapi.json, errors follow RFC 9457 problem+json, idempotency keys prevent duplicate jobs, and agent docs are at /llms.txt.