Classification Details
Deeper notes on how classifications are configured.
Categories
A classification is just a set of categories. Each category has:
| Field | Required | Purpose |
|---|---|---|
name | yes | The label shown in UIs and webhook payloads. |
description | recommended | Plain-English explanation of what belongs in this category. The classifier uses this. |
keywords | optional | Up to 20 words/phrases that strongly indicate this category. The classifier weights these. |
linked_extraction_id | optional | If set, files in this category auto-chain into the linked extraction. |
How the classifier decides
For each file the classifier:
- Reads the document (OCR if needed).
- Compares against each category's
description+keywords. - Picks the highest-scoring category, returning the chosen
category_id, a confidence value in[0, 1], and a one-sentencereasoningfield explaining why.
If no category scores above the threshold (default 0.5), the file
is left unclassified — classified_category_id is null. You
can manually assign it via the PATCH endpoint.
Confidence interpretation
> 0.9— Very high; safe to chain into automated flows.0.7 – 0.9— High; consider a quick spot-check.0.5 – 0.7— Medium; route to a human review queue.< 0.5— Low; the model is unsure. Will returnnullcategory.
Chaining
When a category has linked_extraction_id set, the system
automatically uploads the same file to the linked extraction
as soon as classification completes. This means one upload to the
classifier triggers:
- 1 page charged for classification
- N pages charged for the linked extraction (where N = the file's page count)
The classifier doesn't double-count pages — only the chained extraction is billed at the per-page rate.
Overrides are sticky
If you PATCH a file's category, the override is permanent. The
classifier won't reconsider it on a redo unless you also clear the
override (set classified_category_id to null).
Updating categories
If you change a category's name, description, or keywords,
existing files keep their original verdict. To re-run with the new
definitions, call the Re-classify endpoint on each file.
Best practices
- Start with descriptions, add keywords later. A clear two-line description tends to outperform a long keyword list.
- Avoid overlapping categories. "Invoice" and "Receipt" overlap; rename to "Vendor invoice (B2B)" vs "POS receipt (B2C)" if you need to distinguish.
- Calibrate on 50 docs first. Run a small batch and review borderline cases before scaling up.