Textblob and sentiment analysis: how to refine a dictionary?

Question

Many people use text blob for sentiment analysis on text. I am sure that I am missing something in understanding the approach and how to use it, but there is something that does not work at all with the results I am getting from my analysis.

This is an example of data that I have:

Top                                                     Text                                                   label    sentiment   polarity
51  CVD-Grown Carbon Nanotube Branches on Black Si...   silicon-carbon nanotube (bSi-CNT) hybrid struc...         -1    (-0.16666666666666666, 0.43333333333333335) -0.166667
69  Navy postpones its largest-ever Milan exercise...   Navy on Tuesday postponed a multi-nation mega ...           -1  (-0.125, 0.375) -0.125000
81 Malaysia rings alarm bell on fake Covid...   The United Nations International Children's Em...                   -1  (-0.5, 1.0) -0.500000
82  Poison Not Transmitted By Air...    it falls on the fabric remains 9 hours, so was...                   -1  (-0.2, 0.0) -0.200000
87  A WhatsApp rumor is spreading that is allegedl...   strict about unsourced speculation than other ...        -1 (-0.1, 0.1) -0.100000
90  Dumb Whatsapp Forwards - Page 2 - Cricket Web   as the ones that say like or share this pictur...          -1   (-0.375, 0.5)   -0.375000
144 malaysia | Unicef Malaysia rings alarm b... such messages claiming to be from us,” #Milan...                -1  (-0.5, 1.0) -0.500000
134 False and unverified claims are being...    Soccer was not issued by the U...                               -1  (-0.4000000000000001, 0.6)  -0.400000
123 Truth behind the Viral message about Co...  number of stories ever since the wave of misin...               -1  (-0.4, 0.7) -0.400000
166 In India, Fake WhatsApp Forwards on Coronaviru...   of confirmed cases of rises rapidl...                   -1  (-0.5, 1.0) -0.500000

I used the following algorithm:

df['sentiment'] = df['Top'].apply(lambda Tweet: TextBlob(Tweet).sentiment)

df1=pd.DataFrame(df['sentiment'].tolist(), index= df.index)

df_new = df
df_new['polarity'] = df1['polarity']
df_new.polarity = df1.polarity.astype(float)
df_new['subjectivity'] = df1['subjectivity']
df_new.subjectivity = df1.polarity.astype(float)
# print(df_new)

conditionList = [
    df_new['polarity'] == 0,
    df_new['polarity'] > 0,
    df_new['polarity'] < 0]
choiceList = ['neutral', 'not_fake', 'fake']
df_new['label'] = np.select(conditionList, choiceList, default='no_label')

but as you can see the all these messages come from fact checking sources, so they are not fake. How could I improve the results, maybe removing some specific words? I can see that if the text contains false, unverified, viral, fake, it is tagged as negative and this makes results even worst.

In first place, the sentiments and fact check are two different things. They are not correlated such that you can tell whether a sample is fake or not using its polarity score. — Ashwin Geet D'Sa, Oct 16 '20 at 09:32
Have you tried removing stop words, while focusing on verbs, adjectives and nouns? — lynx, Oct 18 '20 at 11:03

Stripedbass · Answer 1 · 2020-10-18T13:07:46.317

1

All of your text has negative polarity, so they get labeled fake as per your code.

There is no indication how that polarity field is determined, it is in the source file precalculated. If it is using textblob default polarity algo, what text is it running against?

(Also, there may be a typo. Df_new.subjectivity is getting assigned the float cast of polarity)

edited Oct 18 '20 at 13:07

answered Oct 18 '20 at 11:48

Stripedbass

194
1
8

I used Tweet given the length of text, but I am not sure if it was the case to use google news. How could i improve the algorithm of textblob? I have thought I had to use the default one, without changing anything. – Oct 18 '20 at 13:55
1

Textblob by design measures “negativity” as polarity. The text “Lying to your mom is not good” returns negative polarity because “not good” is negative. Your code says anything with negative polarity is fake, ergo “Lying to your mom is not good” gets labeled as fake by your code. This is not a coding problem, your basic premise is flawed – Stripedbass Oct 18 '20 at 15:29
I see what you mean. I was looking at this answer: https://datascience.stackexchange.com/questions/75364/better-approach-to-assign-values-to-determine-potential-fake-sentences . I would like to improve the algorithm, for sure, as I am not a supporter of default built-in function in Python (though I used TextBlob here). – Oct 18 '20 at 22:09

Textblob and sentiment analysis: how to refine a dictionary?

1 Answers1