3

I have tried to follow the advice from here, but I got this error:

C:\OpenNLP_models\tool\apache-opennlp-1.5.3-bin\apache-opennlp-1.5.3\bin>opennlp TokenizerME C:\OpenNLP_models\tool\apache-opennlp-1.5.3-bin\apache-opennlp-1.5.3\bin\thai.tok.bin < test.txt

Loading Tokenizer model ... Exception in thread "main" java.lang.NullPointerException
    at opennlp.tools.util.model.BaseModel.getManifestProperty(BaseModel.java:491)
    at opennlp.tools.util.model.BaseModel.initializeFactory(BaseModel.java:245)
    at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:237)
    at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:181)
    at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:125)
    at opennlp.tools.cmdline.tokenizer.TokenizerModelLoader.loadModel(TokenizerModelLoader.java:39)
    at opennlp.tools.cmdline.tokenizer.TokenizerModelLoader.loadModel(TokenizerModelLoader.java:31)
    at opennlp.tools.cmdline.ModelLoader.load(ModelLoader.java:62)
    at opennlp.tools.cmdline.tokenizer.TokenizerMETool.run(TokenizerMETool.java:41)
    at opennlp.tools.cmdline.CLI.main(CLI.java:225)

The test.txt file contains the sentence "ผมหิวข้าว".

Could anyone tell me how to fix it? I want to use the POSTagger. Thank you.

polm23
  • 14,456
  • 7
  • 35
  • 59
Music
  • 133
  • 1
  • 1
  • 7

1 Answers1

0

I think you're missing the manifest.properties file. Can you unzip the thai.tok.bin file and check that it contains these files:

  1. token.model (binary tokenizer model)
  2. manifest.properties (configuration)

Contents of manifest.properties should be like this, taken from the question you link to:

Manifest-Version=1.0.
Language=th
OpenNLP-Version=1.5.0
Component-Name=TokenizerME
useAlphaNumericOptimization=false
polm23
  • 14,456
  • 7
  • 35
  • 59
  • My manifest.properties file is as you posted , but I got the same error. – Music Jul 25 '18 at 05:52
  • Sorry, no idea then. – polm23 Jul 25 '18 at 05:53
  • Sorry for my mistake. Your solution is practical, but I defined the wrong file's extension of manifest.properties.Thank you. – Music Jul 25 '18 at 06:53
  • Can I ask you some more question? I got a new problem which is that the result seem not to be encode as utf-8 "à¸?ินอะไรยังนาย" . Do you know how I can fix it. – Music Jul 25 '18 at 07:00
  • Sounds like a locale problem. What is the value of LANG and LC_ALL if you type `locale`? (Also, are you using Cygwin? I don't know much about Windows...) – polm23 Jul 25 '18 at 09:25
  • I cannot type LANG, LC_ALL on command window and I did not use Cygwin. My solution now is to run on Eclipse. Thank for your attention and sorry for replying very late. – Music Jul 26 '18 at 09:43