I need to OCR old statistical tables that contain numerical values for each town in a given area. I use Tesseract 4.0.0-beta.3, and in most cases I get acceptable results, but in some others the software fails to recognise the structure of the table and skips rows or entire columns.
I was trying to apply a more suitable configuration by checking --help-psm
, but honestly I couldn't figure out which one could improve my results. I also tried to slice up the tables to individual columns, but the results were even worse. I suppose the issue is that some cells contain 1 or 2 digit numbers, and the rows are deemed to short, which is usually good, but here it is rather problematic. What settings would you use to optimise results?