4

I am new to Syntaxnet and i tried to use pre-trained model of Turkish language through the instructions here

Point-1 : Although I set the MODEL_DIRECTORY environment variable, tokenize.sh didn't find the related path and it gives error like below :

root@4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi." | syntaxnet/models/parsey_universal/tokenize.sh
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: label-map**)

Point-2 : So, I changed the tokenize.sh through commenting the MODEL_DIR=$1 and set my Turkish language model path like below to go on :

PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
CONTEXT=syntaxnet/models/parsey_universal/context.pbtxt
INPUT_FORMAT=stdin-untoken
MODEL_DIR=$1
MODEL_DIR=syntaxnet/models/etiya-smart-tr

Point-3 : After that when I run it as told, it gives error like below :

root@4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi" | syntaxnet/models/parsey_universal/tokenize.sh
I syntaxnet/term_frequency_map.cc:101] Loaded 29 terms from syntaxnet/models/etiya-smart-tr/label-map.
I syntaxnet/embedding_feature_extractor.cc:35] Features: input.char input(-1).char input(1).char; input.digit input(-1).digit input(1).digit; input.punctuation-amount input(-1).punctuation-amount input(1).punctuation-amount 
I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: chars;digits;puncts
I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 16;16;16
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: syntaxnet/models/etiya-smart-tr/char-map**)

I had downloaded the Turkish package through tracing the link pattern indicated like download.tensorflow.org/models/parsey_universal/.zip and my language mapping file list like below :

  • -rw-r----- 1 root root 50646 Sep 22 07:24 char-ngram-map

    -rw-r----- 1 root root 329 Sep 22 07:24 label-map

    -rw-r----- 1 root root 133477 Sep 22 07:24 morph-label-set

    -rw-r----- 1 root root 5553526 Sep 22 07:24 morpher-params

    -rw-r----- 1 root root 1810 Sep 22 07:24 morphology-map

    -rw-r----- 1 root root 10921546 Sep 22 07:24 parser-params

    -rw-r----- 1 root root 39990 Sep 22 07:24 prefix-table

    -rw-r----- 1 root root 28958 Sep 22 07:24 suffix-table

    -rw-r----- 1 root root 561 Sep 22 07:24 tag-map

    -rw-r----- 1 root root 5234212 Sep 22 07:24 tagger-params

    -rw-r----- 1 root root 172869 Sep 22 07:24 word-map

QUESTION-1 : I am aware that there is no char-map file in the directory so I got the error written @ Point-3 above. So, does anyone have an opinion about how the Turkish language test could be done and the result was shared as %93,363 for Part-of-Speech for example?

QUESTION-2: How can I find the char-map file for Turkish language?

QUESTION-3: If there is no char-map file, must I train through tracing the steps indicated as SyntaxNet's Obtain Data & Training?

QUESTION-4: Is there a way to generate word-map, char-map... etc. files? Is it the well known word2vec approach that can be used to generate map files which will be able to be processed wt. Syntaxnet tokenizers?

Serkan Arıkuşu
  • 5,549
  • 5
  • 33
  • 50
ehangul
  • 43
  • 2

1 Answers1

1

Try this https://github.com/tensorflow/models/issues/830 issue - it contains an (at this moment) temporary solution.

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Roman Marusyk Jan 11 '17 at 08:26
  • I can't attach patch here, so external link still would be needed. – Илья Ефимов Jan 11 '17 at 15:27
  • It seems that at the link given user "xtknight" wrote the solution code. Thanks for sharing the link. – ehangul Jan 12 '17 at 09:18