API Reference
Parse any document into AI-ready context from your own code or an agent. One endpoint to submit, one to poll.
FileDigest ships a small public API so an agent or a script can do exactly what the dashboard does: send a file, get back clean Markdown, structured per-source representations, and RAG chunks. There are two endpoints, both authenticated with a Bearer key.
Authentication
Every request needs an API key. Create one in your dashboard under FileDigest Settings, then send it as a Bearer token:
Authorization: Bearer fd_live_...Keys are tied to your account and your plan limits. Calls without a valid key return 401.
Submit a parse job
POST /v1/parse accepts either a multipart file or a JSON { source_url }. It hides the create, upload, register, and process steps and returns 202 with a job id to poll.
curl -X POST https://filedigest.dev/v1/parse \
-H "Authorization: Bearer fd_live_..." \
-F "file=@report.pdf" \
-F "mode=accurate_tables"To parse a file by URL instead of uploading bytes:
curl -X POST https://filedigest.dev/v1/parse \
-H "Authorization: Bearer fd_live_..." \
-H "Content-Type: application/json" \
-d '{ "source_url": "https://example.com/report.pdf", "ocr": true }'Response:
{ "job_id": "abc123", "status": "accepted", "poll": "/v1/jobs/abc123" }Options
| Field | Values | What it does |
|---|---|---|
mode | fast_text, accurate_tables | Extraction strategy. Use accurate tables when structure matters. |
ocr | true, false | Run OCR on scanned or image-only pages (requires a plan with OCR). |
quality | standard, high | High uses the VLM pipeline for hard layouts (slower). |
enrich_formulas | true, false | Convert math to LaTeX (slower). |
enrich_code | true, false | Detect code blocks and language (slower). |
describe_pictures | true, false | Generate image captions (slower, VLM). |
Idempotency
Send an Idempotency-Key header to make retries safe. Replaying the same key returns the original job instead of creating a duplicate.
The file size limit is 100MB per API request. Over-limit, quota, and engine errors come back as RFC 9457 problem details with a code field (for example QUOTA_EXCEEDED, FILE_TOO_LARGE, MODAL_UNAVAILABLE).
Poll for the result
GET /v1/jobs/{id} returns the current status. While the job is pending or processing, poll until it reaches completed or failed.
curl https://filedigest.dev/v1/jobs/abc123 \
-H "Authorization: Bearer fd_live_..."A completed job carries the result inline:
{
"job_id": "abc123",
"status": "completed",
"result": {
"tokens": 24017,
"parsed_files": 7,
"failed_files": 0,
"digest": "# report.pdf\n...AI-ready Markdown...",
"manifest": { }
}
}Output
The result block holds everything you need downstream:
digest: the combined, source-organized Markdown context pack (the samedigest.mdyou download in the app).manifest: structured run metadata plus, for each source, arepresentationsblock withmarkdown,html,doctags,docling_json, and heading-contextualizedchunksready to embed.tokens,parsed_files,failed_files: counts for the run.
The dashboard also exposes the matching provenance.json for source URLs, hashes, and job provenance. See the Examples page for a real packet you can download.
Machine-readable contract and agent files
- OpenAPI 3.1 spec: the full machine contract for both endpoints.
- llms.txt: a short agent-discovery file describing the product and its endpoints.
- llms-full.txt: the expanded agent-discovery file.
These let an agent discover and call FileDigest without reading this page first.