Look at the source code of Tesseract.java
:
@Override
public void createDocuments(String[] filenames, String[] outputbases, List<RenderedFormat> formats) throws TesseractException {
if (filenames.length != outputbases.length) {
throw new RuntimeException("The two arrays must match in length.");
}
init();
setTessVariables();
try {
for (int i = 0; i < filenames.length; i++) {
File workingTiffFile = null;
try {
String filename = filenames[i];
// if PDF, convert to multi-page TIFF
if (filename.toLowerCase().endsWith(".pdf")) {
workingTiffFile = PdfUtilities.convertPdf2Tiff(new File(filename));
filename = workingTiffFile.getPath();
}
TessResultRenderer renderer = createRenderers(outputbases[i], formats);
createDocuments(filename, renderer);
api.TessDeleteResultRenderer(renderer);
} catch (Exception e) {
// skip the problematic image file
logger.error(e.getMessage(), e);
} finally {
if (workingTiffFile != null && workingTiffFile.exists()) {
workingTiffFile.delete();
}
}
}
} finally {
dispose();
}
}
/**
* Creates documents.
*
* @param filename input file
* @param renderer renderer
* @throws TesseractException
*/
private void createDocuments(String filename, TessResultRenderer renderer) throws TesseractException {
api.TessBaseAPISetInputName(handle, filename); //for reading a UNLV zone file
int result = api.TessBaseAPIProcessPages(handle, filename, null, 0, renderer);
if (result == ITessAPI.FALSE) {
throw new TesseractException("Error during processing page.");
}
}
Exception is thrown at line 579. This method is called by a public method above - at line 551. This is inside the try-catch block with logger.error(e.getMessage(), e);
in the catch body (line 555).
Now the question is what you really want to achieve?
If you don't want to see this log, you can configure slf4j to not print the log from this library.
If you want to get the actual exception, it is not possible as the library swallows it. I am not familiar with the library, but looking at the code it doesn't seem like there is any nice option - the method that throws the exception is private and is used only in this one place - under the try-catch block. However, the exception is thrown when api.TessBaseAPIProcessPages(...)
returns ITessAPI.FALSE
and api
has a getter. So you could get it, call TessBaseAPIProcessPages(...)
method and check for the result. This might be not ideal as you will probably be processing every image twice. Another solution is to fork the source code and modify it yourself. You might also want to contact the author and ask for advice - you could take it further and submit a pull request for them to approve and release.