0

I'm trying to generate the scorer of the DeepSpeech-Polyglot-Project. I have followed every step of the documentation but when i run:

python3 /DeepSpeech/data/lm/generate_lm.py --input_txt /DeepSpeech/data_prepared/texts/${LANGUAGE}/clean_vocab.txt --output_dir /DeepSpeech/data_prepared/texts/${LANGUAGE}/ --top_k 500000 --kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie --discount_fallback

I get the following error:

Saving top 500000 words ...

Calculating word statistics ...
  Your text file has 202185630 words in total
  It has 2106729 unique words
  Your top-500000 words are 98.7433 percent of all words
  Your most common word "die" occurred 7853080 times
  The least common word in your top-k is "adamantium" with 5 times
  The first word with 6 occurrences is "begibst" at place 448270

Creating ARPA file ...
=== 1/5 Counting and sorting n-grams ===
Reading /DeepSpeech/data_prepared/texts/de/lower.txt.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Traceback (most recent call last):
  File "/DeepSpeech/data/lm/generate_lm.py", line 210, in <module>
    main()
  File "/DeepSpeech/data/lm/generate_lm.py", line 201, in main
    build_lm(args, data_lower, vocab_str)
  File "/DeepSpeech/data/lm/generate_lm.py", line 97, in build_lm
    subprocess.check_call(subargs)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/DeepSpeech/native_client/kenlm/build/bin/lmplz', '--order', '5', '--temp_prefix', '/DeepSpeech/data_prepared/texts/de/', '--memory', '85%', '--text', '/DeepSpeech/data_prepared/texts/de/lower.txt.gz', '--arpa', '/DeepSpeech/data_prepared/texts/de/lm.arpa', '--prune', '0', '0', '1', '--discount_fallback']' died with <Signals.SIGSEGV: 11>.

I'm using this documentation: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot

I am thankful for every hint.

Chiara
  • 9
  • 1

1 Answers1

0

This has been discussed on DeepSpeech's Discourse.

Basically your KenLM is not installed right. Search just for the error on Google and you'll find that you have to reinstall and check your environment.

Olaf
  • 158
  • 7
  • I'm sorry, but I don't understand the answer on Discourse. I tried to reinstall KenLM as followed: `git clone https://github.com/kpu/kenlm` `cd kenlm` `mkdir build` `cd build` `apt install build-essential cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev` `cmake ..` `make -j 4` To be sure that I'm not missing something I also added `pip install https://github.com/kpu/kenlm/archive/master.zip` It always installs `kenlm-0.0.0`is that normal or did i mess something up? – Chiara Jan 15 '21 at 14:33
  • Ah, don't use the pip version. This is definitely the problem. You will have to build it yourself, try this: https://medium.com/tekraze/install-kenlm-binaries-on-ubuntu-language-model-inference-tool-33507000f33 – Olaf Jan 16 '21 at 09:31