5

Imagine, you have different OCR tools to read text from images but none of them gives you a 100% accurate output. Combined however, the result could come very close to the ground truth - What would be the best technique to "fuse" the text together to get good results?

Example:

Actual text

§ 5.1: The contractor is obliged to announce the delay by 01.01.2019 at the latest. The identification-number to be used is OZ-771LS.

OCR tool 1

5 5.1 The contractor is obliged to announce the delay by O1.O1.2019 at the latest. The identification-number to be used is OZ77lLS.

OCR tool 2

§5.1: The contract or is obliged to announce theedelay by 01.O1. 2O19 at the latest. The identification number to be used is O7-771LS

OCR tool 3

§ 5.1: The contractor is oblige to do announced he delay by 01.01.2019 at the latest. T he identification-number ti be used is OZ-771LS.

What could be a promising algorithm to fuse OCR 1, 2 and 3 to get the actual text?

My first idea was creating a "tumbling window" of an arbitrary length, compare the words in the window and take the words 2 out of 3 tools predict for every position.

For example with window size 3:

[5 5.1 The] 
[§5.1: The contract] 
[§ 5.1: The] 

As you see, the algorithm doesn't work as all three tools have different candidates for position one (5, §5.1: and §).

Of course it would be possible to add some tricks like Levenshtein distance to allow some deviations but I fear this will not really be robust enough.

Christian Vorhemus
  • 2,396
  • 1
  • 17
  • 29

0 Answers0