I need to calculate OCR character accuracy
Sample ground value:
Non sinking ship is friendship
Sample ocr value input:
non singing ship is finedship
Areas of concern are:
- missed characters
- extra characters
- misplaced characters
Character accuracy is defined by the number of actual characters with their places divided by the total of actual characters.
I need a python script to find this accuracy. My initial implementation is as follows:
ground_value = "Non sinking ship is friendship"
ocr_value = "non singing ship is finedship"
ground_value_characters = (re.sub('\s+', '',
ground_value)).strip() # remove all spaces from the gr value string
ocr_value_characters = (re.sub('\s+', '',
ocr_value)).strip() # remove all the spaces from the ocr string
total_characters = float(len(
ground_value_characters))
def find_matching_characters(ground, ocr):
total = 0
for char in ground:
if char in ocr:
total = total + 1
ocr = ocr.replace(char, '', 1)
return total
found_characters = find_matching_characters(ground_value_characters,
ocr_value_characters)
accuracy = found_characters/total_characters
I couldn't get what I was hoping for. Any help would be appreciated.