2

I have a text file and 2 user defined positive and negative files. I'am comparing the words present the 2 files with the text file, and returning either positive or negative.

But i need to print those keywords in the text, which categorized them to either positive or negative.

example of the output i looking for:

file_name       IBM                         Keywords     Label

audio1.wav     The customer is good         good         Positive
audio2.wav     the service is bad           bad          Negative

Please let me know how to go about it. Here's the code so far

pos = readwords('C:\\Users\\anagha\\Desktop\\SynehackData\\positive.txt')
neg = readwords('C:\\Users\\anagha\\Desktop\\SynehackData\\Negative.txt')

pos = [w.lower() for w in pos]
neg = [w.lower() for w in neg]

def assign_comments_labels(x):
    try:
        if any(w in x for w in pos) :      
            return 'positive'
        elif any(w in x for w in neg):
            return 'negative'
        else:
            return 'neutral'
    except:
        return 'neutral'

import pandas as pd
df = pd.read_csv("C:\\Users\\anagha\\Desktop\\SynehackData\\noise_free_audio\\outputfile.csv", encoding="utf-8") 

df['IBM'] = df['IBM'].str.lower()
df['file_name'] = df['file_name'].str.lower()

df['labels'] = df['IBM'].apply(lambda x: assign_comments_labels(x))

df[['file_name','IBM','labels']] 
bhansa
  • 7,282
  • 3
  • 30
  • 55
Anagha
  • 3,073
  • 8
  • 25
  • 43
  • Which part you are getting the problem ? – bhansa Apr 21 '17 at 10:46
  • I'm unable to create the column "Keyword", where based on these words, its labeling the text as either positive or negative in "Label" column. I need to to add a column which gives the keyword – Anagha Apr 21 '17 at 10:54

1 Answers1

1

A good start would be to have the right indentation in the assign_comments_labels(x) function. Indent the whole body.

Edited answer:
Ok I get your question now;

This code should work for you based on the logic you used above:

def get_keyword(x):
   x_ = x.split(" ")
   try:
      for word in x_:
         if (word in neg) or (word in pos):
            return word
   except:
      return -1

   return -1

Then can use lambda as you did for labels:

df['keywords'] = df['IBM'].apply(lambda x: get_keyword(x))

Edit 2:
To return multiple keywords per sentence you can modify the code to return a list;

def get_keyword(x):
   x_ = x.split(" ")
   keywords = []
   try:
      for word in x_:
         if (word in neg) or (word in pos):
            keywords.append(word)
   except:
      return -1

   return keywords

An even better solution would be to create two functions

  • get_pos_keywords(x)
  • get_neg_keywords(x)

And instead of one column for keywords in your DataFrame you will have two, one for pos and one for neg.

Usually texts would have both positive and negative keywords, however the weight of each word would classify the end result of the sentence as positive or negative. If this is your case then I highly recommend you implement the second solution.

Note:
For second solution change the if statement to

# For positive keywords function    
if word in pos:
    keywords.append(word)

# For negative keywords function
if word in neg:
    keywords.append(word)

Hope that helps

McN
  • 63
  • 6
  • sorry it was while copying here the indentation was copied right. However the code is working fine for me, I just need to create the keyword column, please let me know how to go about it – Anagha Apr 21 '17 at 11:10
  • Can it be modified for reading more than one word like "very bad" , "no money" etc. ? – Anagha Apr 21 '17 at 15:07
  • Yes you can modify it to find expressions of more than one word... You will have to loop through the list of keywords and use the your_sentence.find(keyword) function. if the function returns an int different than -1 this means you have a match – McN Apr 22 '17 at 05:36