šŸ“„DocParse Docs

šŸ“‘ Supported File Types

DocParse processes documents up to 25 MB per file and up to 30 files per batch upload.

Document formats

FormatExtensionsNotes
PDF.pdfMost common; supports text-based and scanned PDFs (we OCR scanned pages automatically).
Word.docx, .docModern and legacy Office formats.
Plain text.txt, .mdUseful for already-parsed data or chat transcripts.
Images.png, .jpg, .jpegPhotographed or scanned documents; OCR is applied automatically.

What counts as a page?

We count pages the same way they appear in the source document:

  • PDF — one rendered page = one page.
  • Word — one logical page in the document = one page.
  • Image — one image = one page.
  • Plain text — every 3,000 characters = one page (rounded up).

Size limits

LimitValue
Max file size25 MB
Max files per upload batch30
Max pages per document200
Concurrent batches per accountNo hard cap; we'll scale you

Multi-page documents are processed in parallel under the hood. A 50-page invoice is typically returned in under 30 seconds.

OCR

Scanned PDFs and images go through OCR automatically. You don't need to do anything special — just upload and the parser handles it.

For best OCR accuracy:

  • Aim for at least 300 DPI on scans.
  • Prefer color or grayscale over pure black-and-white.
  • Avoid heavy compression.

What we don't support (yet)

  • Excel spreadsheets (.xlsx, .xls) — coming soon.
  • PowerPoint (.pptx) — coming soon.
  • HTML / web pages — strip to text/markdown first.
  • Encrypted PDFs — must be decrypted before upload.

Need a format we haven't listed? Let us know — we prioritize based on customer demand.