0

I am working on a code which recognizes online handwriting recognition. It works with CTC loss function and Word Beam Search (custom implementation: githubharald)

TF Version: 1.14.0

Following are the parameters used:

batch_size: 128
total_epoches: 300
hidden_unit_size: 128
num_layers: 2
input_dims: 10 (number of input Features)
num_classes: 80 (CTC output logits)
save_freq: 5
learning_rate: 0.001
decay_rate: 0.99
momentum: 0.9
max_length: 1940.0 (BLSTM with variable length time stamps)
label_pad: 63

The problem that I'm facing is, that after changing the decoder from CTC Greedy Decoder to Word Beam Search, my code stalls after a particular step. It does not show the output of the first epoch and is stuck there for about 5-6 hours now.

The step it is stuck after: tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10

I am using a Nvidia DGX-2 for training (name: Tesla V100-SXM3-32GB)

halfer
  • 19,824
  • 17
  • 99
  • 186
  • You should not use word beam search in the training code, as it is slower than best path decoding. However, it should still not take 5-6h (except you use some of the more complex modes in combination with a very large language model, but in the simple "Words" mode it should definitely not take that long). – Harry Dec 20 '19 at 20:22
  • Hi Thanks for the answer. Okay so I am sampling during my validation and dumping rnn output after every epoch for a particular batch(say batch 5 everytime). Is it fine to use Word Beam Search (with NGramForecast method) decoder over this dumped rnn output as a testing measure for C.E.R. (Character Error Rate). Is this strategy legit? – mastershot201 Dec 21 '19 at 19:25

1 Answers1

0

Here is the paper describing word beam search, maybe it contains some useful information for you (I'm the author of the paper).

I would look at your task as two separate parts:

  1. optical model, i.e. train a model that is as good as possible at reading text just by "looking" at it
  2. language model, i.e. use a large enough text corpus, use a fast enough mode of the decoder

To select the best model for part (1), using best path (greedy) decoding for validation is good enough. If the best path contains wrong characters, chances are high that also beam search has no chance to recover (even when using language models).

Now to part (2). Regarding runtime of word beam search: you are using "NGramsForecast" mode, which is the slowest of all modes. It has running time O(W*log(W)) with W being the number of words in the dictionary. "NGrams" has O(log(W)). If you look into the paper and go to Table 1, you see that the runtime gets much worse when using the forecast modes ("NGramsForecast" or "NGramsForecastAndSample"), while character error rate may or may not get better (e.g. "Words" mode has 90ms runtime, while "NGramsForecast" has over 16s for the IAM dataset).

For practical use cases, I suggest the following:

  • if you have a dictionary (that means, a list of unique words), then use "Words" mode
  • if you have a large text corpus containing enough sentences in the target language, then use "NGrams" mode
  • don't use the forecast modes, instead use "Words" or "NGrams" mode and increase the beam width if you need better character error rate
Harry
  • 1,105
  • 8
  • 20
  • Hey! So I tried using Greedy Decoder to quickly brute-force evaluate my model. On online character recognition of IAM-on DB(Stroke level labelled sentences). I am getting a CER of 12.7 on the same. However, when I run the CTCWordBeamSearch on the test set with the following parameters: 1. LM Smoothening: 0.01 2. Beam Width: tried 2 extremes of [5, 25, 30, 50] 3. NGrams mode Currently, the CER that I get is as high as 47%. Is there something that I am doing wrong? – mastershot201 Dec 22 '19 at 22:38
  • which text corpus are you using? The text from the test set (so, each word you want to recognize is included in the text)? If so, the CER should be much lower. Could you also try Words mode? – Harry Dec 26 '19 at 17:05
  • I am using the Iam Dataset text as my corpus text. I picked the corpus from the Simple HTR repository itself (link below). https://github.com/githubharald/SimpleHTR/blob/master/data/corpus.txt – mastershot201 Dec 27 '19 at 23:53
  • the corpus file of SimpleHTR is extracted from the IAM offline-dataset (training and validation), while you are using IAM online-dataset. So you should create your own corpus or dictionary from the online-dataset, as the texts differ to my knowledge. – Harry Dec 28 '19 at 14:39
  • Hey! I read your paper and had a small query. You mentioned the rudimentary LM for which the dictionary of words along with train+validate set could be used. As far as words mode is concerned, I believe this LM would do a good job. However, since the context between words is only limited to the use of training data(and not the dictionary), would it be possible to use this corpus with NGrams or NGramsForecast? – mastershot201 Jan 04 '20 at 16:48