Train SyntaxNet model

Question

I am trying to train the Google Syntaxnet model in a different language using the datasets available at http://universaldependencies.org/ and following this tutorial. I edited the syntaxnet/context.pbtxt file but when I try to run the bazel's script provided in the guide I got the following error:

syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. Not found: brain_pos/greedy/0/label-map)

My doubt is: I have to provide this file and the other files such as fine-to-universal.map, tag-map, word-map and so on, or the train step have to create them using the training dataset? And if I have to provide them, how can I build them?

Thanks in advance

score 0 · Answer 1 · answered Jun 06 '16 at 06:55

I'm trying to do the same thing as you and ran into the exact same error. It turned out that I accidentally removed the flag --compute_lexicon. I suppose that this flag takes care of creating tag-map, word-map etc. So just make sure that --compute_lexicon is enabled.

score 0 · Answer 2 · answered Jun 08 '16 at 18:57

Well I got a similar error and to be honest I didn't find out what was the problem but I used this link to learn the training and testing process and it provides useful documantation for training.

you might not change the format of training, tuning and test datasets from .conllu to .conl or training shell maybe confused by the directories which are mentioned in --arg_prefix, --output_path , --task_context or even --model_path

score 0 · Answer 3 · answered Jul 08 '16 at 10:23

I recall having a similar error at the beginning. Did you use the exact code under 'training a parser step 1: local pretraining'? Because you will notice there's an uninitialized $PARAMS variable in there that is supposed to represent the parameters of your trained POS tagger. When you train a tagger (see earlier in the same tutorial), it will create files in models/brain_pos/greedy/$PARAMS. I believe that in your case, this $PARAMS variable was interpreted as 0 and the script is looking for a trained tagger in brain_pos/greedy/0 which it obviously does not find. If you just add a line at the beginning of the script that specifies the parameters of a trained tagger (128-0.08-3600-0.9-0 in the tutorial) it should work.

Thus:

$PARAMS=128-0.08-3600-0.9-0
bazel-bin/syntaxnet/parser_trainer \
  --arg_prefix=brain_parser \
  --batch_size=32 \
  --projectivize_training_set \
  --decay_steps=4400 \
  --graph_builder=greedy \
  --hidden_layer_sizes=200,200 \
  --learning_rate=0.08 \
  --momentum=0.85 \
  --output_path=models \
  --task_context=models/brain_pos/greedy/$PARAMS/context \
  --seed=4 \
  --training_corpus=tagged-training-corpus \
  --tuning_corpus=tagged-tuning-corpus \
  --params=200x200-0.08-4400-0.85-4

Train SyntaxNet model

3 Answers3