Classification Details

Deeper notes on how classifications are configured.

Field	Required	Purpose
`name`	yes	The label shown in UIs and webhook payloads.
`description`	recommended	Plain-English explanation of what belongs in this category. The classifier uses this.
`keywords`	optional	Up to 20 words/phrases that strongly indicate this category. The classifier weights these.
`linked_extraction_id`	optional	If set, files in this category auto-chain into the linked extraction.

How the classifier decides

For each file the classifier:

Reads the document (OCR if needed).
Compares against each category's description + keywords.
Picks the highest-scoring category, returning the chosen category_id, a confidence value in [0, 1], and a one-sentence reasoning field explaining why.

If no category scores above the threshold (default 0.5), the file is left unclassified — classified_category_id is null. You can manually assign it via the PATCH endpoint.

Confidence interpretation

> 0.9 — Very high; safe to chain into automated flows.
0.7 – 0.9 — High; consider a quick spot-check.
0.5 – 0.7 — Medium; route to a human review queue.
< 0.5 — Low; the model is unsure. Will return null category.

Chaining

When a category has linked_extraction_id set, the system automatically uploads the same file to the linked extraction as soon as classification completes. This means one upload to the classifier triggers:

1 page charged for classification
N pages charged for the linked extraction (where N = the file's page count)

The classifier doesn't double-count pages — only the chained extraction is billed at the per-page rate.

Overrides are sticky

If you PATCH a file's category, the override is permanent. The classifier won't reconsider it on a redo unless you also clear the override (set classified_category_id to null).

Updating categories

If you change a category's name, description, or keywords, existing files keep their original verdict. To re-run with the new definitions, call the Re-classify endpoint on each file.

Best practices

Start with descriptions, add keywords later. A clear two-line description tends to outperform a long keyword list.
Avoid overlapping categories. "Invoice" and "Receipt" overlap; rename to "Vendor invoice (B2B)" vs "POS receipt (B2C)" if you need to distinguish.
Calibrate on 50 docs first. Run a small batch and review borderline cases before scaling up.