I am new to Syntaxnet and i tried to use pre-trained model of Turkish language through the instructions here
Point-1 : Although I set the MODEL_DIRECTORY environment variable, tokenize.sh didn't find the related path and it gives error like below :
root@4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi." | syntaxnet/models/parsey_universal/tokenize.sh
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: label-map**)
Point-2 : So, I changed the tokenize.sh through commenting the MODEL_DIR=$1 and set my Turkish language model path like below to go on :
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
CONTEXT=syntaxnet/models/parsey_universal/context.pbtxt
INPUT_FORMAT=stdin-untoken
MODEL_DIR=$1
MODEL_DIR=syntaxnet/models/etiya-smart-tr
Point-3 : After that when I run it as told, it gives error like below :
root@4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi" | syntaxnet/models/parsey_universal/tokenize.sh
I syntaxnet/term_frequency_map.cc:101] Loaded 29 terms from syntaxnet/models/etiya-smart-tr/label-map.
I syntaxnet/embedding_feature_extractor.cc:35] Features: input.char input(-1).char input(1).char; input.digit input(-1).digit input(1).digit; input.punctuation-amount input(-1).punctuation-amount input(1).punctuation-amount
I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: chars;digits;puncts
I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 16;16;16
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: syntaxnet/models/etiya-smart-tr/char-map**)
I had downloaded the Turkish package through tracing the link pattern indicated like download.tensorflow.org/models/parsey_universal/.zip and my language mapping file list like below :
-rw-r----- 1 root root 50646 Sep 22 07:24 char-ngram-map
-rw-r----- 1 root root 329 Sep 22 07:24 label-map
-rw-r----- 1 root root 133477 Sep 22 07:24 morph-label-set
-rw-r----- 1 root root 5553526 Sep 22 07:24 morpher-params
-rw-r----- 1 root root 1810 Sep 22 07:24 morphology-map
-rw-r----- 1 root root 10921546 Sep 22 07:24 parser-params
-rw-r----- 1 root root 39990 Sep 22 07:24 prefix-table
-rw-r----- 1 root root 28958 Sep 22 07:24 suffix-table
-rw-r----- 1 root root 561 Sep 22 07:24 tag-map
-rw-r----- 1 root root 5234212 Sep 22 07:24 tagger-params
-rw-r----- 1 root root 172869 Sep 22 07:24 word-map
QUESTION-1 : I am aware that there is no char-map file in the directory so I got the error written @ Point-3 above. So, does anyone have an opinion about how the Turkish language test could be done and the result was shared as %93,363 for Part-of-Speech for example?
QUESTION-2: How can I find the char-map file for Turkish language?
QUESTION-3: If there is no char-map file, must I train through tracing the steps indicated as SyntaxNet's Obtain Data & Training?
QUESTION-4: Is there a way to generate word-map, char-map... etc. files? Is it the well known word2vec approach that can be used to generate map files which will be able to be processed wt. Syntaxnet tokenizers?