This project aims to extract tables from scanned image PDFs using Optical Character Recognition.
- 
Tesseract OCR sudo apt-get install tesseract-ocr 
- 
Imagemagick sudo apt-get install imagemagick 
- 
PDF Utilities sudo apt-get install poppler-utils 
- 
Python packages sudo pip install -r requirements.txt 
- 
Clear the pdf/ folder and copy all your pdf files to be scanned in it. 
- 
Run the OCR: python3 shellocr.py 
- 
The scanned text files shall be available in the txt/ folder once the process completes. 
- 
If the above doesn't work for you, try the alternate method. 
- 
Save your file as input.pdf in the root directory. 
- 
Run python3 pdf_miner.py