33

I'm trying to build a sample application in java that will read an image file and just output the text extracted from the image. I found the Tesseract project which seems promising, however, its in c++. In order to use it, should I simply run it as a command line from my java app Runtime.exec(...) ? Or is there a better solution, maybe a JAR? Additionally, this is just a sample app, would running it as a command line app be a concern from scalability perspective?

Omnipresent
  • 29,434
  • 47
  • 142
  • 186
  • 3
    http://tess4j.sourceforge.net/, never tried this, though. – miku Dec 20 '12 at 14:49
  • Good find, @miku. It uses JNA, which would have been the answer to the request (that, or JNI), but no need to reinvent the wheel... – PhiLho Dec 20 '12 at 15:10

6 Answers6

42

Now tesseract is provided by the javacv project, this is a far better option than using Tess4J since all that is required is adding a single dependency to your pom file, the native libs for your platform will then be downloaded and linked automatically for you by the javacv tesseract version.

I've created an example maven project here - https://github.com/piersy/BasicTesseractExample

and also an example gradle project here - https://github.com/piersy/BasicTesseractExampleGradle

For this to work on my ubuntu machine I needed to update my install of libstdc++6

I achieved this by running the following although just installing libstdc++6 may work for you.

sudo add-apt-repository ppa:ubuntu-toolchain-r/test 
sudo apt-get update
sudo apt-get install libstdc++6

Note the gradle project does not perform the automatic install but is is still a hell of a lot simpler than using Tess4J

The javacv project is here - https://github.com/bytedeco/javacpp-presets/tree/master/tesseract

Big props to the javacv guys, only wish I'd found this earlier as it would have saved me a week of getting tess4j to work on multiple platforms!

PiersyP
  • 5,003
  • 2
  • 33
  • 35
  • 1
    tnx for the sample but I get error "java.lang.UnsatisfiedLinkError: no jnilept in java.library.path" when I "mvn clean install" on my Mac, any ideas? – Spring Jun 18 '15 at 19:26
  • 1
    detailed error "Library not loaded: /Users/saudet/projects/bytedeco/javacpp-presets/leptonica/cppbuild/macosx-x86_64/lib/liblept.4.dylib" – Spring Jun 18 '15 at 19:38
  • I also tried to install libstdc++6 on mac, but couldnt find any info what to install and how – Spring Jun 19 '15 at 08:30
  • 2
    You may need to build and install leptonica from source i downloaded leptonica-1.71 extracted it and then ran the following from inside the extracted dir 'CPPFLAGS="-I/usr/local/include" LDFLAGS="-L/usr/local/lib" ./configure && make && sudo make install' – PiersyP Jun 19 '15 at 11:31
  • thanks! a few questions:are these commands/instructions that I also can run on Mac? and why some libraries are missing if javacpp says that they will parse and load all needed native files with a maven goal, is that a problem on machine or c++ compiler? – Spring Jun 19 '15 at 11:41
  • 2
    You can run those on a mac. javacv does a pretty good job of giving you all the binaries required but just like a maven dependency can have hundreds of transitive dependencies so can a c library. If java cv put everything inside the jar it would contain a large part of linux, so I think they have to draw the line somewhere and assume the existence of certain libraries, if you look at the javacpp team's releases you can see that the number of shipped libs is slowly increasing, probably because they have run into exactly the same type of problems you are experiencing. – PiersyP Jun 19 '15 at 13:21
  • tnx in another forum I read that creating a symbolic link solves the problem do you agree with it? $ ln -s /usr/local/lib/liblept.4.dylib /usr/local/lib/liblept.3.dylib – Spring Jun 19 '15 at 15:34
  • I tried everything we talked above, still same error: Library not loaded: /Users/saudet/projects/bytedeco/javacpp-presets/tesseract/cppbuild/macosx-x86_64/lib/libtesseract.3.dylib – Spring Jun 19 '15 at 19:55
  • Help or advice of where I can find help is very appreciated – Spring Jun 19 '15 at 19:56
  • I just tested that tesseract works from commandline, but still not from java – Spring Jun 19 '15 at 22:20
  • 3
    I forked BasicTesseractExampleGradle and created a version that you can build with Maven rather than Gradle - see [BasicTesseractExampleMaven](https://github.com/george-hawkins/BasicTesseractExampleMaven). – George Hawkins Jun 13 '16 at 12:39
  • The Gradle version worked for me out of the box (No Link Errors), excellent work! – Jason D Sep 27 '16 at 20:47
  • Hi @Spring , i got the same issue, what i did was imported all the jar from maven dependencies to my own gradle project along with the tessdata folder. Everything works fine. I hope you have already resolved this. – Tarun Kundhiya Aug 18 '17 at 10:05
12

I have used the tesseract project in my java code. All you need to do is

  1. Get the tess4j jni wrapper for tesseract.
  2. Open the tess4j proj in your ide and add the source packages and libs into your own
    project.
  3. Write the code creating an instance for the tesseract class and then use it for
    performing the OCR.

Please have a look into this http://tphangout.com/?p=18

It gives instructions on how to build a java project to read an image and convert it into text using the tesseract OCR API.

Don Cheadle
  • 5,224
  • 5
  • 39
  • 54
Raja Yogan
  • 918
  • 8
  • 17
8

Have you tried tess4j: http://tess4j.sourceforge.net/.

It is JNI wrapper of tesseract for windows.

kokosing
  • 5,251
  • 5
  • 37
  • 50
  • 1
    @manu [jtesseract](https://github.com/tesseract4java/jtesseract) also contains the [64 bit DLLs for Tesseract 3.03](https://github.com/tesseract4java/jtesseract/releases/tag/tesseract-v3.03). [edit: link fixed] – pvorb May 23 '14 at 18:51
6

I've forked the Basic Git Repo and updated it so that it can be compatible with Tesseract-ocr version (4.x.x) and bytedeco javacpp-presets version (1.4.3).

BasicTesseractExampleVer4

asmmahmud
  • 4,844
  • 2
  • 40
  • 47
  • 3
    This looks a lot more straightforward than tess4j. Just getting started with tesseract on java, and I think this is the way to go. – Fred Andrews May 09 '19 at 00:46
0

just tried https://github.com/piersy/BasicTesseractExample

here's a screenshot

looks like it works, using just one dependency to this:

<dependency>
      <groupId>org.bytedeco.javacpp-presets</groupId>
      <artifactId>tesseract</artifactId>
      <version>3.03-rc1-0.11</version>
</dependency>

which is here: https://github.com/bytedeco/javacpp-presets/tree/master/tesseract

cheers corrado

ccampisano
  • 23
  • 7
-1

I used this How to Test Toast Messages using Appium?

with this

    <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>3.2.1</version>
    </dependency>
SpyZip
  • 5,511
  • 4
  • 32
  • 53