> For the complete documentation index, see [llms.txt](https://docs.nected.ai/nected-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.nected.ai/nected-docs/workflow/add-node/action-nodes/ai-agent-node/document-classifier.md).

# Document Classifier

The **Document Classifier Node** enables your workflows to intelligently read, categorize, and extract information from documents such as PDFs, Word files, or text-based reports.\
This node allows you to automate document understanding at scale — whether it’s sorting invoices by type, detecting content sensitivity, or pulling attributes like customer names or transaction values.

Before using this node, ensure that your workspace is connected with a valid **AI Provider** (e.g., OpenAI, GoogleAI, Anthropic, or Vertex AI).\
[Learn how to integrate AI Providers →](https://docs.nected.ai/nected-docs/integrations/ai-providers)

<details>

<summary>How It Works</summary>

When you upload or link a document to this node, Nected sends it to the selected AI provider for processing. The model classifies or extracts relevant information based on the chosen **Classification Type**, then returns structured JSON results that you can pass on to other workflow nodes.

This eliminates the need for manual document parsing — the AI can directly recognize document type, detect sensitive content, or even extract predefined attributes.

</details>

### **Configuration Tabs**

#### **1. Input Params**

**Connector**

Select the connected AI provider you want to use (e.g., *Financial Advisor asst*).\
This determines which model or assistant will handle the document classification task.

**Assistant ID/Model name**&#x20;

You can use a normal model or any of your AI assistants for this task. If you’re using a custom AI assistant from your provider (like OpenAI’s Assistant API), you can select its **Assistant ID**. This helps reuse a pre-configured assistant for specific tasks, instructions, or datasets. Or, you can use a normal AI model. In that case, you only need to add the model name.

**Document URL**

Provide the direct URL of the document you want to analyze.\
You can also pass a **token-based dynamic link** generated from previous nodes (e.g., a document uploaded through an API or fetched from storage).

**OR → OpenAI File ID**

If you’re using OpenAI’s file-based API, you can enter the **File ID** here instead of a URL.\
This is ideal when you’ve pre-uploaded files directly into your OpenAI workspace.

<figure><img src="/files/t9YwDJQgrMhMqLQezehW" alt=""><figcaption></figcaption></figure>

#### **Classification Type**

This determines the type of analysis or classification that will be performed on your document. Each option enables a distinct document intelligence capability:

| **Classification Type**   | **What It Does**                                                                                               | **Example Use Cases**                                                                                                |
| ------------------------- | -------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| **Document Type**         | Automatically identifies what type of document it is (invoice, report, contract, ID proof, etc.).              | Sorting uploaded documents, detecting invalid uploads                                                                |
| **Content Category**      | Analyzes the topic or subject of the document’s content.                                                       | Categorizing reports into “Financial”, “Legal”, “Marketing”, etc.                                                    |
| **Sensitivity Analysis**  | Detects whether the document contains sensitive information (like PII, financial details, or restricted data). | Flagging confidential documents for restricted access                                                                |
| **Language Detection**    | Identifies the language(s) present in the document.                                                            | Sorting multilingual submissions, routing translation tasks                                                          |
| **Quality Assessment**    | Evaluates whether the document meets quality criteria such as clarity, completeness, or correctness.           | Validating OCR scans, checking data completeness                                                                     |
| **Custom Categories**     | Lets you define your own custom document classes via prompts or assistant instructions.                        | Categorizing documents into internal business-specific labels (e.g., “KYC Docs”, “Tax Filings”, “Client Agreements”) |
| **Attributes Extraction** | Extracts structured data points from the document like name, ID, amount, or date.                              | Pulling invoice totals, contract dates, or policy numbers                                                            |

#### **Additional Options**

**Image/PDF Pre-LLM OCR**

Enable this option if your document is an image or a scanned PDF.\
When turned on, Nected will first apply **Optical Character Recognition (OCR)** to convert the visual content into text before sending it to the AI model.\
Use this for:

* Scanned documents
* Image-based reports or receipts
* Handwritten or camera-captured forms

**PDF Pre-LLM Text Extraction**

Enable this to extract embedded text from a PDF *before* it reaches the AI model.\
This ensures better accuracy and faster classification for machine-readable PDFs.

<figure><img src="/files/RU6SYZsv2NHSFSOcs6Gn" alt=""><figcaption></figcaption></figure>

#### **Add Variable**

You can add optional variables to customize further how the document is processed.

* **Instructions**\
  Add specific guidance for the AI model — for instance:

  > “Extract only the sender’s name, invoice date, and total amount.”\
  > “Classify based on business department, not document format.”

  This variable helps fine-tune how the model interprets the document.
* **Thread ID**\
  Assign a Thread ID if you’re processing related documents (e.g., a batch of forms for the same client).\
  The model can maintain continuity across multiple documents and reuse context for consistent results.

#### **2. Test Results**

Once the inputs are configured, click **Test Node** to preview the output.\
Results are displayed in structured **JSON format**, including:

* Document classification results (type, category, sensitivity, etc.)
* Extracted attributes or entities
* Confidence scores
* Reference IDs and processing time

For example:

```json
{
  "type": "document_type",
  "label": "Invoice",
  "confidence": 0.96,
  "extracted_fields": {
    "Invoice No": "INV-2045",
    "Amount": "₹12,500",
    "Date": "2024-06-12"
  }
}
```

You can switch between **Raw**, **Pretty**, or **Table** views to inspect the result.

#### **3. Settings**

This tab controls the behavior and performance of your classification request.

* **Timeout for API (s):** Time limit for the document classification request.
* **Timeout for Webhook/Cron (s):** Time limit for scheduled or webhook-triggered tasks.
* **Continue on Error:** Allow the workflow to move forward even if the classification fails.
* **Max Tokens:** Limits the length of AI responses.
* **Temperature:** Controls creativity — lower for predictable results, higher for open-ended classification.
* **Max Retries:** Sets how many times Nected retries a failed request.
* **Metadata Required:** Enable to include metadata like file name, type, or page count in the response.
* **Cache & Expiry:** Cache processed document results to optimize repeated classification or re-runs.

### **Practical Examples**

1. **Automated Document Sorting**\
   Upload mixed files (invoices, ID proofs, contracts) and let Nected auto-categorize them using the *Document Type* classifier before saving them to their respective folders.
2. **Sensitive Content Filtering**\
   Scan large batches of uploaded PDFs using *Sensitivity Analysis* to automatically detect and flag confidential or PII-containing files.
3. **Attribute Extraction for Operations**\
   Use *Attributes Extraction* to pull structured data from bills or forms — e.g., vendor name, invoice total, and payment date — and forward that data to your ERP or CRM.
4. **Custom Business Classifications**\
   Configure *Custom Categories* to align AI classification with your internal labels, such as “Compliance Docs”, “Customer KYC”, or “Audit Reports”.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.nected.ai/nected-docs/workflow/add-node/action-nodes/ai-agent-node/document-classifier.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
