Use Pytesseract to Extract Text into Table Arrays Given the Coordinates of the Table Structure

Asked Jul 17 '18 at 15:15

Active Feb 23 '20 at 03:50

Viewed 344 times

I want to extract texts from a scanned table with tesseract and put it them into arrays that have the same structure as the table.

I already used opencv to detect the table structure, and obtained the coordinates of the table joints as well as the entire table structure (stored into np.array).

For example, for the table in this picture:

I want pytesseract to store it into:

my_table = [[x, y, 1, 3],
            [x, a, 2, 3],
            [x, a, 2, 3],
            [x, z, 2, 3]]

I have used commercial OCR softwares and they always detect the table structure first, and secondly, recognize and extract texts to that detected table structure.

How do I accomplish the second step with pytesseract? Answers using Tesseract in other languages are great as well.

edited Feb 23 '20 at 03:50

Cœur

37,241
25
195
267

asked Jul 17 '18 at 15:15

Bec Zhao

Use Pytesseract to Extract Text into Table Arrays Given the Coordinates of the Table Structure

0 Answers0