0

I'm trying to extract some tables from PDF files, and both tools (Abbyy and Omnipage) do a pretty good job of identifying the tables. But when it comes to identifying the rows and columns, they both make the same mistakes.

Usually, the problem comes when they create a partial row, splitting just one cell horizontally, but not the others. For an example of what I mean, see the attached image. In the column on the left, some of the cells are split in half, which makes the table difficult to work with in Excel.

I find it odd that these programs do this in the first place, since tables with split cells are always a pain.

Is there a way of telling these programs to set only full columns and rows, and not split individual cells?

Any suggestions for other solutions?

enter image description here

mgalka
  • 171
  • 1
  • 6
  • are you trying to automate the OCR from your application or looking for end-user application? If the latter then you should better as on Stackexchange – Eugene Mar 23 '16 at 11:00

1 Answers1

1

ABBYY has a lot of OCR products, the configurable ones are called FineReader Engine and FlexiLayout Studio. Other ABBYY products does not have the requested settings.

Nadia Solovyeva
  • 207
  • 1
  • 7