How to extract text from PDF image

Question

I wanted to extract data from PDF which has image, and the image is form where letter will be inside small boxes for example, name : t e s t, here each and every word will be inside square box.

I have tried tesseract OCR could not get the desired result.

I have tried commercial ABBYY worked but I wanted to use java based free API.

below is the example

score 2 · Accepted Answer · answered Jun 07 '18 at 21:23

2

Nicomsoft OCR SDK which is a free SDK has extracted the text from my PDF and results are satisfactory

it supports really large technologies, Now I am trying to integrate it into my application

Link https://www.nicomsoft.com/

answered Jun 07 '18 at 21:23

raghavendra prasad gudipalli

193
1
10

score 0 · Answer 2 · answered May 12 '18 at 23:20

0

As far as free goes in OCR, Tesseract is as good as it gets.

Alternatively you could look at the Windows 10 UWP OCR offering.

answered May 12 '18 at 23:20

fistynuts

306
2
8

score 0 · Answer 3 · answered May 14 '18 at 10:43

0

I am not sure about the free ones out there, but you can definitely try TotalPDFConverterOCR

It has wide range of things like converting to doc,images etc.

answered May 14 '18 at 10:43

nashcharles

1

it did not worked using with the software mentioned, this is internally using Tesseract – raghavendra prasad gudipalli May 15 '18 at 23:04

How to extract text from PDF image

3 Answers3