Document Classifier

The Document Classifier Node enables your workflows to intelligently read, categorize, and extract information from documents such as PDFs, Word files, or text-based reports. This node allows you to automate document understanding at scale — whether it’s sorting invoices by type, detecting content sensitivity, or pulling attributes like customer names or transaction values.

Before using this node, ensure that your workspace is connected with a valid AI Provider (e.g., OpenAI, GoogleAI, Anthropic, or Vertex AI). Learn how to integrate AI Providers →

How It Works

When you upload or link a document to this node, Nected sends it to the selected AI provider for processing. The model classifies or extracts relevant information based on the chosen Classification Type, then returns structured JSON results that you can pass on to other workflow nodes.

This eliminates the need for manual document parsing — the AI can directly recognize document type, detect sensitive content, or even extract predefined attributes.

Configuration Tabs

1. Input Params

Connector

Select the connected AI provider you want to use (e.g., Financial Advisor asst). This determines which model or assistant will handle the document classification task.

Assistant ID/Model name

You can use a normal model or any of your AI assistants for this task. If you’re using a custom AI assistant from your provider (like OpenAI’s Assistant API), you can select its Assistant ID. This helps reuse a pre-configured assistant for specific tasks, instructions, or datasets. Or, you can use a normal AI model. In that case, you only need to add the model name.

Document URL

Provide the direct URL of the document you want to analyze. You can also pass a token-based dynamic link generated from previous nodes (e.g., a document uploaded through an API or fetched from storage).

OR → OpenAI File ID

If you’re using OpenAI’s file-based API, you can enter the File ID here instead of a URL. This is ideal when you’ve pre-uploaded files directly into your OpenAI workspace.

Classification Type

This determines the type of analysis or classification that will be performed on your document. Each option enables a distinct document intelligence capability:

Classification Type

What It Does

Example Use Cases

Document Type

Automatically identifies what type of document it is (invoice, report, contract, ID proof, etc.).

Sorting uploaded documents, detecting invalid uploads

Content Category

Analyzes the topic or subject of the document’s content.

Categorizing reports into “Financial”, “Legal”, “Marketing”, etc.

Sensitivity Analysis

Detects whether the document contains sensitive information (like PII, financial details, or restricted data).

Flagging confidential documents for restricted access

Language Detection

Identifies the language(s) present in the document.

Sorting multilingual submissions, routing translation tasks

Quality Assessment

Evaluates whether the document meets quality criteria such as clarity, completeness, or correctness.

Validating OCR scans, checking data completeness

Custom Categories

Lets you define your own custom document classes via prompts or assistant instructions.

Categorizing documents into internal business-specific labels (e.g., “KYC Docs”, “Tax Filings”, “Client Agreements”)

Attributes Extraction

Extracts structured data points from the document like name, ID, amount, or date.

Pulling invoice totals, contract dates, or policy numbers

Additional Options

Image/PDF Pre-LLM OCR

Enable this option if your document is an image or a scanned PDF. When turned on, Nected will first apply Optical Character Recognition (OCR) to convert the visual content into text before sending it to the AI model. Use this for:

  • Scanned documents

  • Image-based reports or receipts

  • Handwritten or camera-captured forms

PDF Pre-LLM Text Extraction

Enable this to extract embedded text from a PDF before it reaches the AI model. This ensures better accuracy and faster classification for machine-readable PDFs.

Add Variable

You can add optional variables to customize further how the document is processed.

  • Instructions Add specific guidance for the AI model — for instance:

    “Extract only the sender’s name, invoice date, and total amount.” “Classify based on business department, not document format.”

    This variable helps fine-tune how the model interprets the document.

  • Thread ID Assign a Thread ID if you’re processing related documents (e.g., a batch of forms for the same client). The model can maintain continuity across multiple documents and reuse context for consistent results.

2. Test Results

Once the inputs are configured, click Test Node to preview the output. Results are displayed in structured JSON format, including:

  • Document classification results (type, category, sensitivity, etc.)

  • Extracted attributes or entities

  • Confidence scores

  • Reference IDs and processing time

For example:

{
  "type": "document_type",
  "label": "Invoice",
  "confidence": 0.96,
  "extracted_fields": {
    "Invoice No": "INV-2045",
    "Amount": "₹12,500",
    "Date": "2024-06-12"
  }
}

You can switch between Raw, Pretty, or Table views to inspect the result.

3. Settings

This tab controls the behavior and performance of your classification request.

  • Timeout for API (s): Time limit for the document classification request.

  • Timeout for Webhook/Cron (s): Time limit for scheduled or webhook-triggered tasks.

  • Continue on Error: Allow the workflow to move forward even if the classification fails.

  • Max Tokens: Limits the length of AI responses.

  • Temperature: Controls creativity — lower for predictable results, higher for open-ended classification.

  • Max Retries: Sets how many times Nected retries a failed request.

  • Metadata Required: Enable to include metadata like file name, type, or page count in the response.

  • Cache & Expiry: Cache processed document results to optimize repeated classification or re-runs.

Practical Examples

  1. Automated Document Sorting Upload mixed files (invoices, ID proofs, contracts) and let Nected auto-categorize them using the Document Type classifier before saving them to their respective folders.

  2. Sensitive Content Filtering Scan large batches of uploaded PDFs using Sensitivity Analysis to automatically detect and flag confidential or PII-containing files.

  3. Attribute Extraction for Operations Use Attributes Extraction to pull structured data from bills or forms — e.g., vendor name, invoice total, and payment date — and forward that data to your ERP or CRM.

  4. Custom Business Classifications Configure Custom Categories to align AI classification with your internal labels, such as “Compliance Docs”, “Customer KYC”, or “Audit Reports”.

Last updated