How to combine the results of multiple OCR tools to get better text recognition

Question

Imagine, you have different OCR tools to read text from images but none of them gives you a 100% accurate output. Combined however, the result could come very close to the ground truth - What would be the best technique to "fuse" the text together to get good results?

Example:

Actual text

§ 5.1: The contractor is obliged to announce the delay by 01.01.2019 at the latest. The identification-number to be used is OZ-771LS.

OCR tool 1

5 5.1 The contractor is obliged to announce the delay by O1.O1.2019 at the latest. The identification-number to be used is OZ77lLS.

OCR tool 2

§5.1: The contract or is obliged to announce theedelay by 01.O1. 2O19 at the latest. The identification number to be used is O7-771LS

OCR tool 3

§ 5.1: The contractor is oblige to do announced he delay by 01.01.2019 at the latest. T he identification-number ti be used is OZ-771LS.

What could be a promising algorithm to fuse OCR 1, 2 and 3 to get the actual text?

My first idea was creating a "tumbling window" of an arbitrary length, compare the words in the window and take the words 2 out of 3 tools predict for every position.

For example with window size 3:

[5 5.1 The]

[§5.1: The contract]

[§ 5.1: The]

As you see, the algorithm doesn't work as all three tools have different candidates for position one (5, §5.1: and §).

Of course it would be possible to add some tricks like Levenshtein distance to allow some deviations but I fear this will not really be robust enough.

Might be helpful to view this as a merging problem. Not a trivial topic, though. — afarley, Mar 26 '19 at 23:45

How to combine the results of multiple OCR tools to get better text recognition

0 Answers0