stanford corenlp sentiment training set

Question

I am new to the area of NLP and sentiment analysis in particular. My goal is to train the Stanford CoreNLP sentiment model. I am aware that the sentences provided as training data should be in the following format.

(3 (2 (2 The) (2 Rock)) (4 (3 (2 is) (4 (2 destined) (2 (2 (2 (2 (2 to) (2 (2 be) (2 (2 the) (2 (2 21st) (2 (2 (2 Century) (2 's)) (2 (3 new) (2 (2 ``) (2 Conan)))))))) (2 '')) (2 and)) (3 (2 that) (3 (2 he) (3 (2 's) (3 (2 going) (3 (2 to) (4 (3 (2 make) (3 (3 (2 a) (3 splash)) (2 (2 even) (3 greater)))) (2 (2 than) (2 (2 (2 (2 (1 (2 Arnold) (2 Schwarzenegger)) (2 ,)) (2 (2 Jean-Claud) (2 (2 Van) (2 Damme)))) (2 or)) (2 (2 Steven) (2 Segal))))))))))))) (2 .)))

I am also aware that I can create the sentiment training model with my own training data using the following command.

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath     dev.txt -train -model model.ser.gz

My question is, do I have access to the training data set that was used to train the model? If yes, then where can I find it? Also, is there a way I can append new sentences to the original training data set and create the train model?

See [How to train the Stanford NLP Sentiment Analysis tool](http://stackoverflow.com/questions/22586658/how-to-train-the-stanford-nlp-sentiment-analysis-tool). — Wiktor Stribiżew, Mar 02 '17 at 08:22

score 0 · Answer 1 · answered Mar 03 '17 at 23:44

The data is available here: http://nlp.stanford.edu/sentiment/

If you just create a new data set with the same format you can put the files in a directory and set -trainPath to that directory. It will load all files from that directory and train on them.

sample command:

java -Xmx8g edu.stanford.nlp.sentiment.SentimentTraining -train -numHid 25 -trainPath trees/training-data/ -model model.ser.gz

stanford corenlp sentiment training set

1 Answers1

Linked