4

I am having problems while installing tesseract to develop in C++ on Windows 10.

Can anyone provide a guide to get:
1. Leptonica (required by tesseract) lib and includes
2. Tesseract lib and includes
3. Link both to project (e.g. Visual Studio)

so that example from https://github.com/tesseract-ocr/tesseract/wiki/APIExample works:

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
    char *outText;

    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    // Initialize tesseract-ocr with English, without specifying tessdata path
    if (api->Init(NULL, "eng")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }

    // Open input image with leptonica library
    Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
    api->SetImage(image);
    // Get OCR result
    outText = api->GetUTF8Text();
    printf("OCR output:\n%s", outText);

    // Destroy used object and release memory
    api->End();
    delete[] outText;
    pixDestroy(&image);

    return 0;
}
Chris
  • 839
  • 1
  • 12
  • 30

4 Answers4

7

I've been trying to link tesseract library to my c++ project in Visual Studio 2019 for a couple of days and I finally managed to do it. Any thread that I found or even official tesseract documentation do not have full list of instructions on what to do.

I'll list what I have done, hopefully it will help someone. I don't pretend its the optimal way to do so.

  1. There are basic tips in official tesseract documentation. Go to "Windows" section. I did install sw and cppan but I guess it wasn't necessary. The main thing here is installing vcpkg. It requiers Git so I installed it. then:

    > cd c:tools (I installed it in c:\tools, you may choose any dir)

    > git clone https://github.com/microsoft/vcpkg

    > .\vcpkg\bootstrap-vcpkg.bat

    > .\vcpkg\vcpkg install tesseract:x64-windows-static (I used x64 version)

    > .\vcpkg\vcpkg integrate install

At this point everything should work, they said. Headers should be included, libs should be linked. But none was working for me.

  1. Change project configuration to Release x64 (or Release x86 if you installed x86 tesseract).

  2. To include headers: Go to project properties -> C/C++ -> General. Set Additional Include Directories to C:\tools\vcpkg\installed\x64-windows-static\include (or whereever you installed vcpkg)

  3. To link libraries : project properties -> Linker -> General. Set Additional Library Directories to C:\tools\vcpkg\installed\x64-windows-static\lib

  4. Project properties -> C/C++ -> Code Generation. Set Runtime Library to Multi-threaded(/MT). Otherwise I got errors like "runtime mismatch static vs DLL"

  5. Tesseract lib couldn't link to its dependcies, so I added all libs that I had installed to C:\tools\vcpkg\installed\x64-windows-static\lib. Project properties -> Linker -> Input. I set Additional Dependencies to archive.lib;bz2.lib;charset.lib;gif.lib;iconv.lib;jpeg.lib;leptonica-1.80.0.lib;libcrypto.lib;libpng16.lib;libssl.lib;libwebpmux.lib;libxml2.lib;lz4.lib;lzma.lib;lzo2.lib;openjp2.lib;tesseract41.lib;tiff.lib;tiffxx.lib;turbojpeg.lib;webp.lib;webpdecoder.lib;webpdemux.lib;xxhash.lib;zlib.lib;zstd_static.lib;%(AdditionalDependencies)

And after that it finally compiled and launched.

But... api->Init returned -1. To work with tesseract you should have tessdata directory with .traineddata files for the languages you need.

  1. Download tessdata. I got it from official docs. BTW, tessdata_fast worked better than tessdata_best for my purposes :) So I downloaded single "eng" file and saved it like C:\tools\TesseractData\tessdata\eng.traineddata.

  2. Then I added environment variable TESSDATA_PREFIX with value C:\tools\TesseractData\tessdata. I also added C:\tools\TesseractData to Path variables (just in case)

And after all this it is finally working for me.

Nick
  • 71
  • 1
  • 2
5

Install vcpkg ( MS packager to install windows based open source projects) and use powershell command like so .\vcpkg install tesseract:x64-windows-static. Dependency libraries like Leptonica will be auto installed for you. The tesseract can be auto integrated to your VS project using .\vcpkg integrate install.

seccpur
  • 4,996
  • 2
  • 13
  • 21
1

Additionally I found that you also have to install the lzo2.lib through: ./vcpkg install lzo:x64-windows-static. And then pull in lzo2.lib as described by @Nick.

Some of the libraries listed above are no longer supported with the latest versions of Tesseract. VS19 will complain about it when you simply copy them; simply remove the ones that are no longer needed by cross-checking.

For example, tiffxx.lib, hashxx.lib and some others.

NeverSleeps
  • 1,439
  • 1
  • 11
  • 39
rt7085
  • 19
  • 4
0

write the command msys2
pacman -S mingw-w64-{i686,x86_64}-tesseract-ocr