Docs-Insights Subnet

The Docs-Insights is a decentralized system built for advanced document processing tasks. It combines multiple AI models—including vision models, language models, vision-language models (VLMs), and OCR engines—to accurately understand and extract information from documents. This subnet aims to offer a powerful, open-source alternative to proprietary tools, making document comprehension more accessible and efficient. By delivering key insights with a single click, it significantly reduces the time and effort required for document review.

Key Capabilities:

Checkbox and Associated Text Detection - Currently live and operational on SN-84, outperforming industry standards like GPT-4 Vision and Azure Form Recognizer.
Highlighted and Encircled Text Detection - Detects and extracts highlighted or circled text segments accurately (Under Development).
Document Classification - Automatically classifies documents by type (e.g., receipts, forms, letters). This feature is live on SN-84 and powered by the Donut model, a cutting-edge, OCR-free architecture.
Document Parsing - Leverages powerful LLMs to extract key entities like names, addresses, phone numbers, and monetary values. Documents are intelligently segmented into logical sections for improved clarity. Live on SN-84.
JSON Data Structuring - Compiles and formats extracted data into a concise, readable JSON file, significantly reducing document review time.

Architecture

The system consists of two primary components:

Validator
- Equipped with synthetic data generation:
  - The validator first decide the task out of three: "checkbox", "doc-class", "doc-parse"
  - Then validator randomly generates an image along with its corresponding ground truth data related to the decided task.
  - This image is then sent to the miner for processing.
Miner
- checkbox
  - Vision Model: Processes the image to detect checkboxes, returning their coordinates.
  - OCR Engine and Preprocessor: Extracts text from the image, organizes it into lines, and records the coordinates for each line.
  - Post-Processor: Integrates the checkbox and text coordinates to associate text with each checkbox.
- doc-class
  - VLM (Donut): Processes the image to find out the class of the document. No OCR/postprocessor needed here.
- doc-parse
  - OCR Engine: Extracts text from the image, organizes it into lines.
  - LLMs: are used to carefully analyze the text, parse them into main sections, fill the sections with necessary information.

Reward Mechanism

The Validator generates an image and its ground truth, keeping the ground truth file and sending the image to the miner.
The Miner processes the image using models and post-processors, then returns the output to the validator.
The Validator evaluates the result based on:
- Accuracy: Scores based on the
  - overlap of detected bounding box coordinates with the ground truth,
  - and text content matching.
- Validator then rewards top performing miner when it is 5% better than the second best miner.

Installation

To set up the Document Understanding project:

Clone the repository:

git clone https://github.com/TatsuProject/Document_Understanding_Subnet.git
cd Document_Understanding_Subnet

Install required dependencies:

pip install -r requirements.txt
pip install -e .

Install Tesseract (for miners only):
```
sudo apt-get install tesseract-ocr
```
Install and run AI models (for miners only):
Follow the steps in the link below to install the service:
```
https://github.com/TatsuProject/document_insights_base_model 
```
After installation, ensure the service is running on the same machine as the miner.

Usage

On Testnet:

Start the Validator:

python3 neurons/validator.py --netuid 236 --subtensor.network test --wallet.name validator --wallet.hotkey default --logging.debug

Start the Miner:

python3 neurons/miner.py --netuid 236 --subtensor.network test --wallet.name miner --wallet.hotkey default --logging.debug

On Mainnet:

Start the Validator:

python3 neurons/validator.py --netuid 84 --subtensor.network finney --wallet.name validator --wallet.hotkey default --logging.debug

Start the Miner:

python3 neurons/miner.py --netuid 84 --subtensor.network finney --wallet.name miner --wallet.hotkey default --logging.debug

Technical Guide

For more in-depth information, refer to the Technical Guide.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.circleci		.circleci
contrib		contrib
docs		docs
logs		logs
neurons		neurons
scripts		scripts
template		template
test_images		test_images
tests		tests
tmp		tmp
verify		verify
.dependencies_installed		.dependencies_installed
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Subnet_Banner.png		Subnet_Banner.png
min_compute.yml		min_compute.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Docs-Insights Subnet

Key Capabilities:

Table of Contents

Architecture

Reward Mechanism

Installation

Usage

On Testnet:

On Mainnet:

Technical Guide

License

About

Uh oh!

Releases

Packages

Languages

License

Hamo-Miracle/Document-Analysis

Folders and files

Latest commit

History

Repository files navigation

Docs-Insights Subnet

Key Capabilities:

Table of Contents

Architecture

Reward Mechanism

Installation

Usage

On Testnet:

On Mainnet:

Technical Guide

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages