📄DocParse Docs

🖥 Data Extraction — API Endpoints

The complete reference for the Data Extraction API. All endpoints are prefixed with:

plaintext
https://api.docparse-labs.vercel.app/v1

All requests require the Authorization: Bearer <key> header (see Authentication).


POST Create an Extraction

Create a new extraction (i.e. a schema you'll upload documents to).

plaintext
POST /extractions

Body

json
{
  "name": "Invoice extractor",
  "description": "Pull totals and line items from US invoices",
  "language": "Multi-Lingual",
  "fields": [
    { "key": "invoice_number", "type": "string" },
    { "key": "total_amount",   "type": "number" },
    { "key": "issue_date",     "type": "date" },
    { "key": "line_items",     "type": "list<object>", "fields": [
      { "key": "description", "type": "string" },
      { "key": "qty",         "type": "number" },
      { "key": "unit_price",  "type": "number" }
    ]}
  ]
}

Response — 201 Created

json
{
  "extraction": {
    "id": "ext_01HQX...",
    "name": "Invoice extractor",
    "status": "ready",
    "created_at": "2026-05-24T10:00:00Z"
  }
}

POST Add Files to an Extraction

Upload one or more documents to an existing extraction. Each call creates a new batch.

plaintext
POST /extractions/{extraction_id}/batches
Content-Type: multipart/form-data

Form fields

FieldTypeRequiredNotes
filesFile[]yesUp to 30 files, 25 MB each.
namestringnoOptional human-readable batch label.

Response — 202 Accepted

json
{
  "batch": {
    "id": "btc_01HQX...",
    "extraction_id": "ext_01HQX...",
    "status": "queued",
    "file_count": 3,
    "page_count": 14
  }
}

GET Get Batch Status

Poll for batch progress. Returns the batch and one row per file with its current status.

plaintext
GET /extractions/{extraction_id}/batches/{batch_id}

Response — 200 OK

json
{
  "batch": {
    "id": "btc_01HQX...",
    "status": "processed",
    "file_count": 3,
    "page_count": 14
  },
  "files": [
    {
      "id": "file_01HQX...",
      "file_name": "INV-1024.pdf",
      "status": "processed",
      "page_count": 3
    }
  ]
}

Possible status values: queued, processing, processed, needs_review, failed.


GET Get File Result

Fetch the extracted data for a specific file once it's processed.

plaintext
GET /extractions/{extraction_id}/files/{file_id}/result

Response — 200 OK

json
{
  "file_id": "file_01HQX...",
  "result": {
    "invoice_number": "INV-1024",
    "total_amount": 4108.26,
    "issue_date": "2026-05-24",
    "line_items": [
      { "description": "API access", "qty": 1, "unit_price": 4108.26 }
    ]
  },
  "confidence": {
    "invoice_number": 0.99,
    "total_amount": 0.98
  },
  "model_used": "gemini-2.5-flash"
}

GET List Batches

plaintext
GET /extractions/{extraction_id}/batches?limit=20&before=2026-05-24T10:00:00Z

Cursor-paginated by created_at descending. Default limit=20, max 100.


DELETE Delete a File

Removes the file and its extracted data. Pages are not refunded — processing already happened.

plaintext
DELETE /extractions/{extraction_id}/files/{file_id}

Response — 204 No Content


Error responses

All errors are JSON with an error field:

json
{ "error": "File too large (25 MB limit)." }
StatusMeaning
400Bad request — check the body or query.
401Missing or invalid API key.
403Key revoked or out of pages.
404Extraction / batch / file not found.
413File too large.
429Rate-limited; back off and retry.
5xxOur problem; we retry on your behalf if you use webhooks.