android tessaract API to recognize non words

Question

I'm trying to recognize random chars in android with tess-two API . I have a printed paper sheet with the string: "5XqaLB"
when i show to the camera parts of the string to recognize it, i get th following examples:

 original -> result
  "5XqaLB" -> "5anLB"  
  "XqaLB" -> "anLB"  
  "qaLB" -> "qaLB"  
  "5Xq" -> "5Xq"

I suppose this happens because tesseract tries to guess an word with the recognized chars. I searched a lot but can't find a solution. Anyone has ideas to avoid this tesseract replacements?

Already tried whitelist, blacklist, and confs like:

baseApi.setVariable("load_system_dawg", "0");
baseApi.setVariable("load_freq_dawg", "0");
baseApi.setVariable("load_punc_dawg", "0");
baseApi.setVariable("load_number_dawg", "0");
baseApi.setVariable("load_unambig_dawg", "0");
baseApi.setVariable("load_bigram_dawg", "0");
baseApi.setVariable("load_fixed_length_dawgs", "0");
baseApi.setVariable("segment_penalty_garbage", "0");
baseApi.setVariable("segment_penalty_dict_nonword", "0");
baseApi.setVariable("segment_penalty_dict_frequent_word", "0");
baseApi.setVariable("segment_penalty_dict_case_ok", "0");
baseApi.setVariable("segment_penalty_dict_case_bad", "0");

can anyone give any guess how to have tesseract recognize only plain chars?

Jeferson Oliveira Fernande · Answer 1 · 2015-12-15T12:39:47.060

I managed to solve the similar problem I was having. In my case I was recognizing plate characters. Instead of using tesseract in the entire plate image I did a preprocess for separating the characters so I could use tesseract for each char separately. my config varibles:

final TessBaseAPI baseApi = new TessBaseAPI();
    baseApi.init(TESSBASE_PATH, DEFAULT_DIC, TessBaseAPI.OEM_DEFAULT);
    baseApi.setDebug(true);
    baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "ABCDEFGHIJKLMNOPQRSTUVXWYZ1234567890");

    baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_CHAR);
    baseApi.setVariable("load_system_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_freq_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_punc_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_number_dawg", TessBaseAPI.VAR_TRUE);
    baseApi.setVariable("load_unambig_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_bigram_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_fixed_length_dawgs", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_garbage", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_dict_nonword", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_dict_frequent_word", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_dict_case_ok", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_dict_case_bad", TessBaseAPI.VAR_FALSE);
    return baseApi;

Sorry people! I have a similar problem, but in the my case I use tess-two with open-cv to recognize plate number. I tried r3v3r53's method but no success, my next step would be compile traineddata files following tesseract tutorial( https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) but i'm accepting suggestions(I dont understand that tutorial very well so i'm not sure if it's gonna be helpful) for example: an image shows "FPD 0246" but the result is "FPJ 0245" [1]: http://i.stack.imgur.com/Sp7zn.png — Jeferson Oliveira Fernande, Dec 15 '15 at 12:43
You really shouldn't use comments or the answerbox to ask a new question. — rene, Jan 01 '16 at 21:54

android tessaract API to recognize non words

1 Answers1