I'm trying to use appimagetool
(https://appimage.org/) to create a single-binary executable of the OCR program tesseract
(https://github.com/tesseract-ocr). I have built tesseract on Ubuntu 19.10, and I want the executable to run on Ubuntu 14.01.
NOTE: I do not have control over the old version of Ubuntu, and I need features in the late-version tesseract. I have already tried an existing AppImage of tesseract, and it fails in a similar way to what's detailed below.
Somewhat following this tutorial: https://appiomatic.com/blog/creating-appimage-binary-manually-for-linux-from-your-app/ I created a tesseract.AppDir
with the requisite layout:
tesseract.AppDir/AppRun
tesseract.AppDir/.DirIcon
tesseract.AppDir/tesseract.desktop
tesseract.AppDir/tesseract.png
tesseract.AppDir/usr
tesseract.AppDir/usr/bin
tesseract.AppDir/usr/bin/tesseract
tesseract.AppDir/usr/lib
tesseract.AppDir/usr/lib/libtesseract.so.5
tesseract.AppDir/usr/lib/libtesseract.so.5.0.0
...
tesseract.AppDir/usr/share
tesseract.AppDir/usr/share/tessdata
tesseract.AppDir/usr/share/tessdata/eng.traineddata
...
tesseract.AppDir/usr/share/tessdata/tessconfigs
...
And created the AppImage:
[Ubuntu 19.10]$ ~/Downloads/appimagetool-x86_64.AppImage tesseract.AppDir
appimagetool, continuous build (commit effcebc), build 2084 built on 2019-05-01 21:02:41 UTC
Using architecture x86_64
/home/kingsley/Software/Tesseract/tesseract/tesseract.AppDir should be packaged as Tesseract-OCR-x86_64.AppImage
Generating squashfs...
Parallel mksquashfs: Using 6 processors
Creating 4.0 filesystem on Tesseract-OCR-x86_64.AppImage, block size 131072.
[=======================================================================================================================|] 1921/1921 100%
Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072
compressed data, compressed metadata, compressed fragments, compressed xattrs
duplicates are removed
Filesystem size 73511.40 Kbytes (71.79 Mbytes)
30.95% of uncompressed filesystem size (237490.75 Kbytes)
Inode table size 5971 bytes (5.83 Kbytes)
57.29% of uncompressed inode table size (10423 bytes)
Directory table size 1019 bytes (1.00 Kbytes)
56.90% of uncompressed directory table size (1791 bytes)
Number of duplicate files found 0
Number of inodes 92
Number of files 78
Number of fragments 5
Number of symbolic links 3
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 11
Number of ids (unique uids + gids) 1
Number of uids 1
root (0)
Number of gids 1
root (0)
Embedding ELF...
Marking the AppImage as executable...
Embedding MD5 digest
Success
However copying it to the older system, it would not run, saying it was missing libpng16.so.16
.
[Ubuntu14]$ ./Tesseract-OCR-x86_64.AppImage
tesseract: error while loading shared libraries: libpng16.so.16: cannot open shared object file: No such file or directory
Further research led me to believe that I had to manually copy in all the dependencies.
So using ldd
on the tesseract
executable:
[Ubuntu 19.10]$ ldd LOCAL_INSTALL/bin/tesseract
linux-vdso.so.1 (0x00007fffd7937000)
libtesseract.so.5 => not found
liblept.so.5 => /usr/lib/x86_64-linux-gnu/liblept.so.5 (0x00007f44c03d3000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f44c03b0000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f44c01c2000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f44c01a8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f44bffb7000)
libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007f44bff7d000)
libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007f44bfef8000)
libgif.so.7 => /usr/lib/x86_64-linux-gnu/libgif.so.7 (0x00007f44bfeed000)
libtiff.so.5 => /usr/lib/x86_64-linux-gnu/libtiff.so.5 (0x00007f44bfe6c000)
libwebp.so.6 => /usr/lib/x86_64-linux-gnu/libwebp.so.6 (0x00007f44bfc03000)
libopenjp2.so.7 => /usr/lib/x86_64-linux-gnu/libopenjp2.so.7 (0x00007f44bfbad000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f44bfa5c000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f44bfa40000)
/lib64/ld-linux-x86-64.so.2 (0x00007f44c0706000)
libzstd.so.1 => /usr/lib/x86_64-linux-gnu/libzstd.so.1 (0x00007f44bf999000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f44bf972000)
libjbig.so.0 => /usr/lib/x86_64-linux-gnu/libjbig.so.0 (0x00007f44bf764000)
I then copied all those shared libraries into the tesseract.AppDir/usr/lib/
and rebuilt the AppImage again.
Testing on Ubuntu 14 still failed:
[Ubuntu14]$ ./Tesseract-OCR-x86_64.AppImage
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
EDIT: I retried making the AppImage, adding the midding .so files one by one. Only when I finally copy in the libc.so.6
did I get the seg. fault. However, if I leave this library out, the executable run fails with:
tesseract: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.22' not found (required by /tmp/.mount_Tesser6wDkZB/lib/liblept.so.5)
It seems that liblept.so.5
is the problem.
Now I'm pretty much out of ideas.
- Is this not a use-case for AppImages ?
- Is there a way to debug what's going wrong ?
- Is there a tool that automatically finds the dependencies?
- Is Ubuntu 14.01 just too old a target, and I should give up and go back to using
gocr
.