Extraction Details
Deeper notes on how extractions are configured and what fields you get back.
Field types
| Type | Returned as JSON | Notes |
|---|---|---|
string | string | Default for text-like fields. |
number | number | Decimals included. |
integer | number | Will be rounded to nearest int. |
boolean | boolean | true / false only. |
date | ISO 8601 string | E.g. "2026-05-24". |
datetime | ISO 8601 string | E.g. "2026-05-24T10:00:00Z". |
object | nested object | Define fields recursively. |
list<string> | string[] | Useful for tags, tracking numbers. |
list<object> | object[] | For repeating sections — line items, line items, party blocks. |
Field examples
Adding a one-line example to each field dramatically improves accuracy. The example is shown to the model as a hint, not enforced.
{
"key": "invoice_number",
"type": "string",
"description": "The unique invoice ID from the seller.",
"example": "INV-2026-00128"
}Language
Set language at the extraction level. Options:
"Multi-Lingual"(default) — auto-detect per document."English","French","German","Spanish","Hindi","Japanese","Chinese (Simplified)", and more.
Forcing a language gives a small accuracy bump on borderline-quality scans.
Document Options
Per-extraction toggles that influence parsing behavior:
| Option | Effect |
|---|---|
ocr_priority | When true, treat the document as scanned (skip text-layer extraction). Useful for low-quality PDFs. |
infer_missing | Allow the model to infer fields that aren't explicitly written (e.g. total = subtotal + tax). |
strict_format | Reject documents that don't match the expected layout. Returns status: needs_review instead. |
Confidence scores
Every extracted field comes back with a confidence value in [0, 1].
Treat anything below 0.7 as needing human review.
{
"result": { "total_amount": 4108.26 },
"confidence": { "total_amount": 0.98 }
}needs_review status
A file lands in needs_review when at least one field has confidence
below your configured threshold (defaults to 0.7). The result is
still available — but you should surface it in your UI for a human to
double-check.
Resubmitting
To re-process a single file (e.g. after editing the schema), DELETE
it and re-upload. Pages are charged again.