0

I want to build a project in which, when I put a pdf file it extracts from it printed names and handwritten numbers then put them in a CSV file ( excel file )
Please note that the pdf files has a table in which we find names in a column and handwritten marks in the other column
So can you suggest an example or best python frameworks or engines ou there?
Please make sure to see an example in this image below
example of the image or pdf file
THANK YOU IN ADVANCE

sxeros
  • 668
  • 6
  • 21
  • For handwritten you have to train a model to detect handwritten digits,and for scanned document part you can use tessaract. i would suggest to use Keras – geekzeus Feb 12 '20 at 12:32
  • did you try google image APIs? If you can use non local code... – B. Go Feb 12 '20 at 14:29

2 Answers2

0

The Python-Framework 'Tesseract' can do what you are looking for. You may want to have a look at this blog. There are basic instructions described. Tessereact for Python

sxeros
  • 668
  • 6
  • 21
  • the problem is : tessract or pytessract can detect only the text thats is written in a specif language like full paragraph in eng or fr ... not names ! names are a different type , i need to know how i can detect names –  Feb 12 '20 at 13:43
  • @Adem Youssef another way could be to build and train your own CNN. I would suggest using Keras with Tensorflow. For Numbers, you can use an MNIST-trained Network. There are similar solutions for letter-Detection..you may have to read your way through this topic – sxeros Feb 12 '20 at 13:57
  • @Adem Youssef OpenCV provides useful functionalities to find and detect edges and stuff...maybe helpful too – sxeros Feb 12 '20 at 13:58
  • thank you @sxeros for your help it kinda put me on the right way –  Feb 14 '20 at 09:27
  • @Adem Youssef glad to help :D – sxeros Feb 14 '20 at 09:29
0

You can use Pytesseract for texts. Pytesseract is an optical character recognition (OCR) tool for Python. It will help you in recognizing the text from the images.

For handwritten digits, you could go through Tensorflow or Keras with mnist dataset.

Photodeus
  • 657
  • 1
  • 5
  • 18
Faizan Alam
  • 91
  • 1
  • 2