Document Classifier
The Document Classifier Node enables your workflows to intelligently read, categorize, and extract information from documents such as PDFs, Word files, or text-based reports. This node allows you to automate document understanding at scale — whether it’s sorting invoices by type, detecting content sensitivity, or pulling attributes like customer names or transaction values.
Before using this node, ensure that your workspace is connected with a valid AI Provider (e.g., OpenAI, GoogleAI, Anthropic, or Vertex AI). Learn how to integrate AI Providers →
Configuration Tabs
1. Input Params
Connector
Select the connected AI provider you want to use (e.g., Financial Advisor asst). This determines which model or assistant will handle the document classification task.
Assistant ID/Model name
You can use a normal model or any of your AI assistants for this task. If you’re using a custom AI assistant from your provider (like OpenAI’s Assistant API), you can select its Assistant ID. This helps reuse a pre-configured assistant for specific tasks, instructions, or datasets. Or, you can use a normal AI model. In that case, you only need to add the model name.
Document URL
Provide the direct URL of the document you want to analyze. You can also pass a token-based dynamic link generated from previous nodes (e.g., a document uploaded through an API or fetched from storage).
OR → OpenAI File ID
If you’re using OpenAI’s file-based API, you can enter the File ID here instead of a URL. This is ideal when you’ve pre-uploaded files directly into your OpenAI workspace.

Classification Type
This determines the type of analysis or classification that will be performed on your document. Each option enables a distinct document intelligence capability:
Classification Type
What It Does
Example Use Cases
Document Type
Automatically identifies what type of document it is (invoice, report, contract, ID proof, etc.).
Sorting uploaded documents, detecting invalid uploads
Content Category
Analyzes the topic or subject of the document’s content.
Categorizing reports into “Financial”, “Legal”, “Marketing”, etc.
Sensitivity Analysis
Detects whether the document contains sensitive information (like PII, financial details, or restricted data).
Flagging confidential documents for restricted access
Language Detection
Identifies the language(s) present in the document.
Sorting multilingual submissions, routing translation tasks
Quality Assessment
Evaluates whether the document meets quality criteria such as clarity, completeness, or correctness.
Validating OCR scans, checking data completeness
Custom Categories
Lets you define your own custom document classes via prompts or assistant instructions.
Categorizing documents into internal business-specific labels (e.g., “KYC Docs”, “Tax Filings”, “Client Agreements”)
Attributes Extraction
Extracts structured data points from the document like name, ID, amount, or date.
Pulling invoice totals, contract dates, or policy numbers
Additional Options
Image/PDF Pre-LLM OCR
Enable this option if your document is an image or a scanned PDF. When turned on, Nected will first apply Optical Character Recognition (OCR) to convert the visual content into text before sending it to the AI model. Use this for:
Scanned documents
Image-based reports or receipts
Handwritten or camera-captured forms
PDF Pre-LLM Text Extraction
Enable this to extract embedded text from a PDF before it reaches the AI model. This ensures better accuracy and faster classification for machine-readable PDFs.

Add Variable
You can add optional variables to customize further how the document is processed.
Instructions Add specific guidance for the AI model — for instance:
“Extract only the sender’s name, invoice date, and total amount.” “Classify based on business department, not document format.”
This variable helps fine-tune how the model interprets the document.
Thread ID Assign a Thread ID if you’re processing related documents (e.g., a batch of forms for the same client). The model can maintain continuity across multiple documents and reuse context for consistent results.
2. Test Results
Once the inputs are configured, click Test Node to preview the output. Results are displayed in structured JSON format, including:
Document classification results (type, category, sensitivity, etc.)
Extracted attributes or entities
Confidence scores
Reference IDs and processing time
For example:
{
"type": "document_type",
"label": "Invoice",
"confidence": 0.96,
"extracted_fields": {
"Invoice No": "INV-2045",
"Amount": "₹12,500",
"Date": "2024-06-12"
}
}You can switch between Raw, Pretty, or Table views to inspect the result.
3. Settings
This tab controls the behavior and performance of your classification request.
Timeout for API (s): Time limit for the document classification request.
Timeout for Webhook/Cron (s): Time limit for scheduled or webhook-triggered tasks.
Continue on Error: Allow the workflow to move forward even if the classification fails.
Max Tokens: Limits the length of AI responses.
Temperature: Controls creativity — lower for predictable results, higher for open-ended classification.
Max Retries: Sets how many times Nected retries a failed request.
Metadata Required: Enable to include metadata like file name, type, or page count in the response.
Cache & Expiry: Cache processed document results to optimize repeated classification or re-runs.
Practical Examples
Automated Document Sorting Upload mixed files (invoices, ID proofs, contracts) and let Nected auto-categorize them using the Document Type classifier before saving them to their respective folders.
Sensitive Content Filtering Scan large batches of uploaded PDFs using Sensitivity Analysis to automatically detect and flag confidential or PII-containing files.
Attribute Extraction for Operations Use Attributes Extraction to pull structured data from bills or forms — e.g., vendor name, invoice total, and payment date — and forward that data to your ERP or CRM.
Custom Business Classifications Configure Custom Categories to align AI classification with your internal labels, such as “Compliance Docs”, “Customer KYC”, or “Audit Reports”.
Last updated