0

I want to extract numeric data from an image of a table (png/jpeg/etc.) using Python. I don't mind if it's some deep learning algorithm but it doesn't have to be if there is already an existing library.

I've tried various script that I found online. Most of them are some version of using cv2 and pytesseract. One such example is here here. It works for simple tables or sample files used in the algorithm description itself. However, they don't seem to work well for general tables that I want to process, one example is below. enter image description here

Does anyone know any other table recognition scripts/libraries that I can just use out of the box? Thanks.

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
tpoh
  • 261
  • 3
  • 11
  • Did you try with abbyy fine reader or other kind of OCR? – Luis Alejandro Vargas Ramos Aug 04 '22 at 01:25
  • Could you please be a bit more specific as of what does not work well? Is the code successful at delineating the cells? Are you able to detect anything at all? – Sheldon Aug 04 '22 at 02:05
  • @Sheldon the code at the link I posted runs without errors, not all cells are delineated. Maybe it is the grey/white/dark grey colors in the table that is confusing it? The detected output is a 5 by 11 grid whereas the initial image is a 22 by 9. Only two cells are populated with numbers in the 5 by 11 grid, the rest are blank – tpoh Aug 04 '22 at 14:55
  • @LuisAlejandroVargasRamos I'm not familiar with this reader, is there a link with an example you can share? Thanks. – tpoh Aug 04 '22 at 14:56
  • have you found a solution? – Lidor Eliyahu Shelef Dec 19 '22 at 13:04
  • 1
    @LidorEliyahuShelef I have not. There are websites like this https://www.extracttable.com/ I'm still looking for a solution that I can run personally rather than from an external website – tpoh Dec 23 '22 at 17:45

0 Answers0