I am starting to play with Tess4J to see all that it can do. From the tests I have done so far if I type text like I am doing now in a structured horizontal fashion within an image file, I can pick up the text. If however I start to rotate the text at all, I cannot pick up the text with Tess4J. Should Tess4J be able to handle text angled at different angles like vertical, 45 degrees and so on?
3 Answers
Deskewing with Tess4j
Take a look at the source code Tess4j (Java JNA wrapper for Tesseract).
I recently posted this answer (Java image library to deskew and crop images)
Answer:
You can combine ImageDeskew.getSkewAngle() with ImageHelper.rotate(BufferedImage image, double angle).
There is an example on how to use it on the test folder of the tess4j project Tesseract1Test.java
public void testDoOCR_SkewedImage() throws Exception {
logger.info("doOCR on a skewed PNG image");
File imageFile = new File(this.testResourcesDataPath, "eurotext_deskew.png");
BufferedImage bi = ImageIO.read(imageFile);
ImageDeskew id = new ImageDeskew(bi);
double imageSkewAngle = id.getSkewAngle(); // determine skew angle
if ((imageSkewAngle > MINIMUM_DESKEW_THRESHOLD || imageSkewAngle < -(MINIMUM_DESKEW_THRESHOLD))) {
bi = ImageHelper.rotateImage(bi, -imageSkewAngle); // deskew image
}
String expResult = "The (quick) [brown] {fox} jumps!\nOver the $43,456.78 <lazy> #90 dog";
String result = instance.doOCR(bi);
logger.info(result);
assertEquals(expResult, result.substring(0, expResult.length()));
}
-
Please don't add the same answer to multiple questions. Answer the best one and flag the rest as duplicates. – Bhargav Rao Mar 27 '16 at 14:05
osdetect.cpp
has some mechanism for orientation and script detection within Tesseract. This is not brought out to Tess4J, so in this case it is better to interact with the original c++ code. With Tesseract, one uses the psm argument (see this SO question for a full list of values) for "auto-orientation". For example, psm -0
should provide auto-orientation.

- 1
- 1

- 9,474
- 36
- 90
- 105
Tess4J does not offer any extra functionality beyond being a simple wrapper on top of Tesseract. As such, you will have to determine the skew angle and rotate the image before OCR.

- 8,212
- 1
- 16
- 16
-
So that means if a document has text at multiple angles (like if someone scribbled handwritten text in the margins of the document) tess4j won't handle it? – demongolem Nov 12 '12 at 01:59
-
Again, it depends on Tesseract engine itself. Can Tesseract handle your image? – nguyenq Nov 12 '12 at 14:52
-
1The latest distribution of Tess4J includes helper methods that can determine the skew angle and rotate image. – nguyenq Jun 16 '13 at 22:24