3

I am using CoreNLP to calculate sentiment of given text. I have successfully executed it for English. I need to do the same for other languages like Hindi. May I please know how to train the system and use it for other languages? Below is the code for English:

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "I love the display of iPhone but hate its battery life";
Annotation annotation = pipeline.process(text);
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
   Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);
   int sentiment = RNNCoreAnnotations.getPredictedClass(tree);
   System.out.println(sentiment);
}                
Hari
  • 57
  • 12
  • A PTB format dataset is required to train the system. I can see that from command line training can be done using: $ java -cp "*" edu.sta nford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt devPath dev.txt -train -model model.ser.gz But, how to use this for other languages? – Hari Apr 16 '14 at 04:01
  • After i have the PTB file,I am looking to know if there are parameters to instruct the system that it's being trained on Hindi and the text given as input is Hindi and hence, perform sentiment analysis using Hindi. – Hari Apr 16 '14 at 04:30

1 Answers1

2

Information on training Stanford NLP RNTN is provided by mbatchkarov.

In general, non-English Sentiment Analysis is still a work in progress and many methods - especially those that go beyond bag-of-words - may need to be substantially rethought in order to be applicable to another language. For example, aggulutinative languages like Turkish or German (i.e. languages that crunch a lot of words into one big word) aren't even going to jive that well on a number of text-mining and sentiment analysis tasks.

Try googling sentiment analysis for Hindi. One interesting paper I found was this one by Mittal, et al.

Community
  • 1
  • 1
webelo
  • 1,646
  • 1
  • 14
  • 32