🖥 Data Extraction — API Endpoints
The complete reference for the Data Extraction API. All endpoints are prefixed with:
https://api.docparse-labs.vercel.app/v1All requests require the Authorization: Bearer <key> header (see
Authentication).
POST Create an Extraction
Create a new extraction (i.e. a schema you'll upload documents to).
POST /extractionsBody
{
"name": "Invoice extractor",
"description": "Pull totals and line items from US invoices",
"language": "Multi-Lingual",
"fields": [
{ "key": "invoice_number", "type": "string" },
{ "key": "total_amount", "type": "number" },
{ "key": "issue_date", "type": "date" },
{ "key": "line_items", "type": "list<object>", "fields": [
{ "key": "description", "type": "string" },
{ "key": "qty", "type": "number" },
{ "key": "unit_price", "type": "number" }
]}
]
}Response — 201 Created
{
"extraction": {
"id": "ext_01HQX...",
"name": "Invoice extractor",
"status": "ready",
"created_at": "2026-05-24T10:00:00Z"
}
}POST Add Files to an Extraction
Upload one or more documents to an existing extraction. Each call creates a new batch.
POST /extractions/{extraction_id}/batches
Content-Type: multipart/form-dataForm fields
| Field | Type | Required | Notes |
|---|---|---|---|
files | File[] | yes | Up to 30 files, 25 MB each. |
name | string | no | Optional human-readable batch label. |
Response — 202 Accepted
{
"batch": {
"id": "btc_01HQX...",
"extraction_id": "ext_01HQX...",
"status": "queued",
"file_count": 3,
"page_count": 14
}
}GET Get Batch Status
Poll for batch progress. Returns the batch and one row per file with its current status.
GET /extractions/{extraction_id}/batches/{batch_id}Response — 200 OK
{
"batch": {
"id": "btc_01HQX...",
"status": "processed",
"file_count": 3,
"page_count": 14
},
"files": [
{
"id": "file_01HQX...",
"file_name": "INV-1024.pdf",
"status": "processed",
"page_count": 3
}
]
}Possible status values: queued, processing, processed,
needs_review, failed.
GET Get File Result
Fetch the extracted data for a specific file once it's processed.
GET /extractions/{extraction_id}/files/{file_id}/resultResponse — 200 OK
{
"file_id": "file_01HQX...",
"result": {
"invoice_number": "INV-1024",
"total_amount": 4108.26,
"issue_date": "2026-05-24",
"line_items": [
{ "description": "API access", "qty": 1, "unit_price": 4108.26 }
]
},
"confidence": {
"invoice_number": 0.99,
"total_amount": 0.98
},
"model_used": "gemini-2.5-flash"
}GET List Batches
GET /extractions/{extraction_id}/batches?limit=20&before=2026-05-24T10:00:00ZCursor-paginated by created_at descending. Default limit=20,
max 100.
DELETE Delete a File
Removes the file and its extracted data. Pages are not refunded — processing already happened.
DELETE /extractions/{extraction_id}/files/{file_id}Response — 204 No Content
Error responses
All errors are JSON with an error field:
{ "error": "File too large (25 MB limit)." }| Status | Meaning |
|---|---|
| 400 | Bad request — check the body or query. |
| 401 | Missing or invalid API key. |
| 403 | Key revoked or out of pages. |
| 404 | Extraction / batch / file not found. |
| 413 | File too large. |
| 429 | Rate-limited; back off and retry. |
| 5xx | Our problem; we retry on your behalf if you use webhooks. |