AI Document Processing Self-Test
A practical self-test checklist for PDF, DOCX, ZIP, OCR, Markdown, and manifest quality before scaling FileDigest usage.
Before spending money on traffic or paid document workflows, FileDigest should be judged by completed jobs, readable output, and repeatable artifacts. This checklist gives buyers and operators a simple way to test whether document preparation is ready for production AI work.
Start with the public demo packet if you want a reproducible baseline, then run your own files. A meaningful self-test should use documents whose source quality you understand, because damaged scans, unusual tables, and encrypted files can change the result.
Test packet
Run one small packet with:
- one text-based PDF
- one OCR-heavy PDF if your plan supports OCR
- one DOCX file
- one PPTX file
- one HTML export
- one plain-text or Markdown note
- one ZIP containing mixed supported files
What to inspect
| Check | Good sign | Bad sign |
|---|---|---|
| File count | Every accepted file appears in the manifest | A file disappears without a warning |
| Page count | PDFs show plausible page counts; DOCX/page counts are labeled when unknown | Counts are missing without explanation or clearly impossible |
| Source boundaries | Digest sections keep file/source IDs visible | The digest becomes one blended summary |
| Tables | Important tables remain readable or are flagged | Tables silently collapse into unusable text |
| Warnings | Problems are explicit in the manifest | The run pretends imperfect files were perfect |
| Downloads | Digest and manifest download through authenticated routes | Artifacts are public or inaccessible to the owner |
| Downstream reuse | The digest supports a defined ChatGPT, Claude, RAG, or analysis prompt | The output requires heavy manual cleanup before use |
Suggested scoring
Use a simple 0 to 2 score for each item:
0: failed or unusable1: usable with human cleanup2: usable as AI-ready context
This score is a practical self-test heuristic, not a certified benchmark. A first packet is promising when most checks score 2 and there are no privacy, access-control, or silent-data-loss failures.
Quality standards
The output should make source boundaries visible, preserve enough structure to review, and show failures in the manifest instead of pretending every file was perfect.
What FileDigest is optimizing for
FileDigest is not a black-box summarizer. The goal is AI-ready document preparation: inspectable Markdown, structured manifests, private downloads, and repeatable context packs.