I've to create a searchable pdf from multiple 24 bit depth jpg images. I'm using tess-two which by default comes with libpng. The problem is that tesseracts output a corrupt pdf! The images are not present in the pdf. The text is still present in the pdf.
I've no problems when using png files however the input is a jpg image. Converting jpg's to png with the following code is very time consuming:
BitmapFactory.Options options = new BitmapFactory.Options();
options.inPreferredConfig = Bitmap.Config.ARGB_8888;
Bitmap bitmap = BitmapFactory.decodeFile("myimage.jpg", options);
File file = new File("myoutputimage.png");
FileOutputStream fOut;
try
{
fOut = new FileOutputStream(file);
bitmap.compress(Bitmap.CompressFormat.PNG, 0, fOut);
fOut.flush();
fOut.close();
}
catch (Exception e)
{
e.printStackTrace();
}
On my machine it takes 2 seconds to create a png file.
I already compiled tess-two with libjpeg but this wasn't working either. Is it possible to create a searchable pdf with tesseract with jpg input files?