Docs
API Reference

API Reference

Parse any document into AI-ready context from your own code or an agent. One endpoint to submit, one to poll.

FileDigest ships a small public API so an agent or a script can do exactly what the dashboard does: send a file, get back clean Markdown, structured per-source representations, and RAG chunks. There are two endpoints, both authenticated with a Bearer key.

Authentication

Every request needs an API key. Create one in your dashboard under FileDigest Settings, then send it as a Bearer token:

Authorization: Bearer fd_live_...

Keys are tied to your account and your plan limits. Calls without a valid key return 401.

Submit a parse job

POST /v1/parse accepts either a multipart file or a JSON { source_url }. It hides the create, upload, register, and process steps and returns 202 with a job id to poll.

curl -X POST https://filedigest.dev/v1/parse \
  -H "Authorization: Bearer fd_live_..." \
  -F "file=@report.pdf" \
  -F "mode=accurate_tables"

To parse a file by URL instead of uploading bytes:

curl -X POST https://filedigest.dev/v1/parse \
  -H "Authorization: Bearer fd_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "source_url": "https://example.com/report.pdf", "ocr": true }'

Response:

{ "job_id": "abc123", "status": "accepted", "poll": "/v1/jobs/abc123" }

Options

FieldValuesWhat it does
modefast_text, accurate_tablesExtraction strategy. Use accurate tables when structure matters.
ocrtrue, falseRun OCR on scanned or image-only pages (requires a plan with OCR).
qualitystandard, highHigh uses the VLM pipeline for hard layouts (slower).
enrich_formulastrue, falseConvert math to LaTeX (slower).
enrich_codetrue, falseDetect code blocks and language (slower).
describe_picturestrue, falseGenerate image captions (slower, VLM).

Idempotency

Send an Idempotency-Key header to make retries safe. Replaying the same key returns the original job instead of creating a duplicate.

The file size limit is 100MB per API request. Over-limit, quota, and engine errors come back as RFC 9457 problem details with a code field (for example QUOTA_EXCEEDED, FILE_TOO_LARGE, MODAL_UNAVAILABLE).

Poll for the result

GET /v1/jobs/{id} returns the current status. While the job is pending or processing, poll until it reaches completed or failed.

curl https://filedigest.dev/v1/jobs/abc123 \
  -H "Authorization: Bearer fd_live_..."

A completed job carries the result inline:

{
  "job_id": "abc123",
  "status": "completed",
  "result": {
    "tokens": 24017,
    "parsed_files": 7,
    "failed_files": 0,
    "digest": "# report.pdf\n...AI-ready Markdown...",
    "manifest": { }
  }
}

Output

The result block holds everything you need downstream:

  • digest: the combined, source-organized Markdown context pack (the same digest.md you download in the app).
  • manifest: structured run metadata plus, for each source, a representations block with markdown, html, doctags, docling_json, and heading-contextualized chunks ready to embed.
  • tokens, parsed_files, failed_files: counts for the run.

The dashboard also exposes the matching provenance.json for source URLs, hashes, and job provenance. See the Examples page for a real packet you can download.

Machine-readable contract and agent files

  • OpenAPI 3.1 spec: the full machine contract for both endpoints.
  • llms.txt: a short agent-discovery file describing the product and its endpoints.
  • llms-full.txt: the expanded agent-discovery file.

These let an agent discover and call FileDigest without reading this page first.