-1

I need to extract text from image so i found few OCR library

  1. Tess4j

Which didn't worked so I move to apache tika.

In apacke tika , I tried with both ImageParser and JpegParser . It is giving file info but not providing text in my image file.

ΦXocę 웃 Пepeúpa ツ
  • 47,427
  • 17
  • 69
  • 97
Ajay Yadav
  • 1,625
  • 4
  • 19
  • 29
  • 1
    Did you [try reading the Apache Tika documentation on performing OCR](https://wiki.apache.org/tika/TikaOCR)? If yes, where did you get stuck? If not why not? And what happens when you do? – Gagravarr Apr 16 '16 at 18:21
  • Yes I read tika documentation. And code setup is working fine but Jpeg parser is returning text from some images but not from that one which I am have to extract out. – Ajay Yadav Apr 17 '16 at 03:38

2 Answers2

3

You can also run tika from the command line. Run it on just the images you want to perform OCR on:

java -jar ./tika-app/target/tika-app-1.13-SNAPSHOT.jar -t ~/Desktop/tess.png

Tika uses tesseract internally to perform OCR. So you should have that installed and on your PATH.

cafed00d
  • 165
  • 9
1

For Image processing Tessaract is the best api, which provides some methods along with java, try it once. You can find more detailshere

Community
  • 1
  • 1
Balayesu Chilakalapudi
  • 1,386
  • 3
  • 19
  • 43
  • I am using tessaract over linux. It is able to extract out text from image but it is missing some characters and instead of some characters it is considering it as special character. – Ajay Yadav Apr 17 '16 at 04:04
  • improve accuracy with whitelist of characters as described in http://pretius.com/using-tesseract-ocr-to-extract-scanned-invoice-data-in-java-application/ – Balayesu Chilakalapudi Apr 17 '16 at 07:48