When converting PDF to Excel with Omnipage or Abbyy Finereader, is there are way to stop it from splitting individual cells?

Question

I'm trying to extract some tables from PDF files, and both tools (Abbyy and Omnipage) do a pretty good job of identifying the tables. But when it comes to identifying the rows and columns, they both make the same mistakes.

Usually, the problem comes when they create a partial row, splitting just one cell horizontally, but not the others. For an example of what I mean, see the attached image. In the column on the left, some of the cells are split in half, which makes the table difficult to work with in Excel.

I find it odd that these programs do this in the first place, since tables with split cells are always a pain.

Is there a way of telling these programs to set only full columns and rows, and not split individual cells?

Any suggestions for other solutions?

are you trying to automate the OCR from your application or looking for end-user application? If the latter then you should better as on Stackexchange — Eugene, Mar 23 '16 at 11:00

score 1 · Accepted Answer · answered Apr 18 '16 at 04:13

1

ABBYY has a lot of OCR products, the configurable ones are called FineReader Engine and FlexiLayout Studio. Other ABBYY products does not have the requested settings.

answered Apr 18 '16 at 04:13

Nadia Solovyeva

207
1
7

Thanks. Very helpful. I was not aware of those products. – mgalka Apr 19 '16 at 07:55

When converting PDF to Excel with Omnipage or Abbyy Finereader, is there are way to stop it from splitting individual cells?

1 Answers1