How to extract data from boxes of a form using OCR technology?

Asked May 09 '23 at 08:58

Active May 09 '23 at 08:58

Viewed 55 times

How do we use OCR technology to extract numbers from the following format

I tried easyOCR and Tesseract and they fail when we have such boxes. If the numbers are typed (not handwrtten) these boxes still come out to be a problem, bc they perform well without these boxes generally

What would be a nice way to extract from these boxes, given at times these boxes can be contigious and connected to each other .

Is there some significant work done around this,because I think data extraction from documents should be a common problem

Thanks

Code :

import easyocr
def DL_OCR(path):
  reader = easyocr.Reader(['en'])
  result = reader.readtext(path)
  string = ""
  for x in result:
    string+=x[1]+" "
  return string

asked May 09 '23 at 08:58

Sadaf Shafi

1,016
11
27

How to extract data from boxes of a form using OCR technology?

0 Answers0