9

I'm trying to set some Tesseract parameters using the python-tesseract wrapper, but for Init Only parameters I'm unable to do so.

I've been reading the Tesseract documentation and it seems i must use Init() to set these. These is what the setVariable documentation says about that:

Only works for non-init variables * (init variables should be passed to Init()).

So the Init() function has this signature:

const char *    datapath,
const char *    language,
OcrEngineMode   oem,
char **     configs,
int     configs_size,
const GenericVector< STRING > *     vars_vec,
const GenericVector< STRING > *     vars_values,
bool    set_only_non_debug_params

and my code is the following:

import tesseract

configVec =     ['user_words_suffix',   'load_system_dawg',     'load_freq_dawg']
configValues =  ['brands',              '0',                    '0']

api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_TESSERACT_ONLY, None, 0, configVec, configValues, False)
api.SetPageSegMode(tesseract.PSM_AUTO_OSD)
api.SetVariable("tessedit_char_whitelist", "€$0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.\"-/+%")

Problem is i get the following error:

NotImplementedError: Wrong number or type of arguments for overloaded function 'TessBaseAPI_Init'.
  Possible C/C++ prototypes are:
    tesseract::TessBaseAPI::Init(char const *,char const *,tesseract::OcrEngineMode,char **,int,GenericVector< STRING > const *,GenericVector< STRING > const *,bool)

And the issue is related to those GenericVectors. If i use this line instead:

api.Init(".","eng",tesseract.OEM_TESSERACT_ONLY, None, 0, None, None, False)

it works. So the issue are those GenericVectors. How can i pass the correct parameters to Init()?

Is there any other way to set the init only parameters in the code? Could i load a config file from the code with these parameters?

Thank you for your time, any help is greatly appreciated.

tiagosilva
  • 1,695
  • 17
  • 31

1 Answers1

0

For my scenario which was directly interfacing with the API, I did the following:

# This should be specified in the cffi.cdef
BOOL TessBaseAPISetVariable(TessBaseAPI *handle, const char *name, const char *value);

# This should be called afterwards, outside the cdef
# baseapi.h - Params (aka variables) must be done after init line above
# tesseractclass.h - Has list of settable variables like tessedit_char_whitelist
foundVariableName = libtess.TessBaseAPISetVariable(api, 'tessedit_char_whitelist'.encode(), 'ABFGJKLMNOPRSTYZ1234567890/.,-+ |\\'.encode())
print(foundVariableName) # returns 1 is successfully found, 0 if variable name not found
abelito
  • 1,094
  • 1
  • 7
  • 18