Many people use text blob for sentiment analysis on text. I am sure that I am missing something in understanding the approach and how to use it, but there is something that does not work at all with the results I am getting from my analysis.
This is an example of data that I have:
Top Text label sentiment polarity
51 CVD-Grown Carbon Nanotube Branches on Black Si... silicon-carbon nanotube (bSi-CNT) hybrid struc... -1 (-0.16666666666666666, 0.43333333333333335) -0.166667
69 Navy postpones its largest-ever Milan exercise... Navy on Tuesday postponed a multi-nation mega ... -1 (-0.125, 0.375) -0.125000
81 Malaysia rings alarm bell on fake Covid... The United Nations International Children's Em... -1 (-0.5, 1.0) -0.500000
82 Poison Not Transmitted By Air... it falls on the fabric remains 9 hours, so was... -1 (-0.2, 0.0) -0.200000
87 A WhatsApp rumor is spreading that is allegedl... strict about unsourced speculation than other ... -1 (-0.1, 0.1) -0.100000
90 Dumb Whatsapp Forwards - Page 2 - Cricket Web as the ones that say like or share this pictur... -1 (-0.375, 0.5) -0.375000
144 malaysia | Unicef Malaysia rings alarm b... such messages claiming to be from us,” #Milan... -1 (-0.5, 1.0) -0.500000
134 False and unverified claims are being... Soccer was not issued by the U... -1 (-0.4000000000000001, 0.6) -0.400000
123 Truth behind the Viral message about Co... number of stories ever since the wave of misin... -1 (-0.4, 0.7) -0.400000
166 In India, Fake WhatsApp Forwards on Coronaviru... of confirmed cases of rises rapidl... -1 (-0.5, 1.0) -0.500000
I used the following algorithm:
df['sentiment'] = df['Top'].apply(lambda Tweet: TextBlob(Tweet).sentiment)
df1=pd.DataFrame(df['sentiment'].tolist(), index= df.index)
df_new = df
df_new['polarity'] = df1['polarity']
df_new.polarity = df1.polarity.astype(float)
df_new['subjectivity'] = df1['subjectivity']
df_new.subjectivity = df1.polarity.astype(float)
# print(df_new)
conditionList = [
df_new['polarity'] == 0,
df_new['polarity'] > 0,
df_new['polarity'] < 0]
choiceList = ['neutral', 'not_fake', 'fake']
df_new['label'] = np.select(conditionList, choiceList, default='no_label')
but as you can see the all these messages come from fact checking sources, so they are not fake. How could I improve the results, maybe removing some specific words? I can see that if the text contains false, unverified, viral, fake, it is tagged as negative and this makes results even worst.