0

I have a list of comments made by executives. They are never the same (very unlikely). They indicate the overall sentiment of the company's performance. My objective is to use the past comments to train a classifier and sort the future comments as positive or negative. Is this possible? What techniques will help me achieve this outcome? Help is much appreciated. I have included some sample comments below:

“Business [is] improving and lead times are extending by two or more weeks.”

“Very positive outlook for this quarter. Production goals have been adjusted multiple times and increased each time due to demand.”

“Product demand continues to be solid.”

“Bookings are heavy early in the season. Expect robust first half of the year.”

“Demand still outstrips capacity. Competitors have announced heavy capital investments to increase capacity.”

“Sales and business continue to be strong and increasing.”

“Business holding steady in Q1.”

“Medical device manufacturing is still strong.”

“Even though oil and gas prices are on the upswing, we still face a tough 2017 and will continue to save on costs.”

“Major focus on commodities and potential [for] further inflation.”

prashanth manohar
  • 531
  • 1
  • 13
  • 30
  • 1
    This is like a whole thesis... One way to start is to tokenise your sentences and have lists of positive and negative words. Use some basic counts to see how many positive vs negative words show up – Rohan Mar 14 '17 at 01:50
  • Should I manually label the words as positive and negative or is there a dictionary I can use? – prashanth manohar Mar 14 '17 at 01:52
  • @prashanthmanohar do you have a large sets of comments like that? are they already labeled (positive vs negative) ? – Pascal Soucy Mar 14 '17 at 15:43
  • Actually, I have only a couple of 100s of them. It is not labeled. I can do it manually. – prashanth manohar Mar 14 '17 at 15:55
  • 1
    In my experience it is a bit easier to label documents (positive, negative) than labelling words. Then using a text classifier (example Naive Bayes), on your training data, it would learn automatically which words + context make a doc positive vs negative. Several 100s might be a bit low though to get accurate estimates. Removing stopwords in your case would help. I recommend you read http://www.nltk.org/book/ch06.html particularly section 1.3 if not already done – Pascal Soucy Mar 14 '17 at 17:49
  • 1
    a good starting point is here: https://web.stanford.edu/~jurafsky/slp3/6.pdf – greeness Mar 16 '17 at 22:33

0 Answers0