- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.3k
4.0 Accuracy and Performance
| Engine | Total char errors | Word Recall Errors | Word Precision Errors | Walltime | CPUtime* | 
|---|---|---|---|---|---|
| Tess 3.04 | 13.9 | 30 | 31.2 | 3.0 | 2.8 | 
| Cube | 15.1 | 29.5 | 30.7 | 3.4 | 3.1 | 
| Tess+Cube | 11.0 | 24.2 | 25.4 | 5.7 | 5.3 | 
| LSTM | 7.6 | 20.9 | 20.8 | 1.5 | 2.5 | 
Note in the above table that LSTM is faster than Tess 3.04 (without adding cube) in both wall time and CPU time! For wall time by a factor of 2.
test on HP Z420 on a single Hindi page
| Engine Mode | Test Mode | Real User | 
|---|---|---|
| Default (-oem 3 = cube + tess) | 7.6 | 7.3 | 
| Base Tess (-oem 0) | 2.9 | 2.6 | 
| Cube (-oem 1) | 5.4 | 4.9 | 
| LSTM With OpenMP+AVX | 1.8 | 3.8 | 
| LSTM No OpenMP with AVX | 2.7 | 2.4 | 
| LSTM No OpenMP with SSE | 3.1 | 2.7 | 
| LSTM No OpenMP no SIMD at all | 4.6 | 4.1 | 
My first test with a simple screenshot gave significant better results with LSTM, but needed 16 minutes CPU time (instead of 9 seconds) with a debug build of Tesseract (-O0). A release build (-O2) needs 17 seconds with LSTM, 4 seconds without for the same image.
The slow speed with debug is to be expected. The new code is much more memory intensive, so it is a lot slower on debug (also openmp is turned off by choice on debug). The optimized build speed sounds about right for Latin-based languages. It is the complex scripts that will run faster relative to base Tesseract.
Ref: https://github.com/tesseract-ocr/tesseract/issues/40
Old wiki - no longer maintained. The pages were moved, see the new documentation.
These wiki pages are no longer maintained.
All pages were moved to tesseract-ocr/tessdoc.
The latest documentation is available at https://tesseract-ocr.github.io/.