0

Similarly to this post, I'm trying to use the Affective Norms for English Words (in French) for a sentiment analysis with Quanteda. I ultimately want to create a "mean sentiment" per text in my corpus.

First, I load in the ANEW dictionary (FAN in French) and create a named vector of weights. ANEW differs from other dictionaries since it does not use a key: value pair format, but rather assigns a numerical score to each word. The goal is to select features and then scoring them using weighted counts. The ANEW file looks like this : MOT/ VALENCE cancer: 1.01, potato: 3.56, love: 6.56

#### FAN DATA ####
# read in the FAN data
df_fan <- read.delim("fan_anew.txt", stringsAsFactors = FALSE)
# construct a vector of weights with the term as the name
vector_fan <- df_fan$valence
names(vector_fan) <- df_fan$mot

Then I tried to apply dfm_weight() to my corpus of 27 documents.

# create a dfm selecting on the FAN words
dfm_fan <- dfm(my_corpus, select = df_fan$mot, language = "French")

dfm_fan_weighted <- dfm_fan %>%
dfm_weight(scheme = "prop") %>%
dfm_weight(weights = vector_fan)
## Warning messages:
## 1: dfm_weight(): ignoring 696 unmatched weight features 
## 2: In diag(weight) : NAs introduced by coercion

Here is what I get, only 6 documents are included in the dfm object generated and the code doesn't estimate the ANEW mean score for each document in the original corpus.

tail(dfm_fan_weighted)
## Document-feature matrix of: 6 documents, 335 features (72.6% sparse).
tail(dfm_fan_weighted)[, c("absent", "politique")]
## Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) : invalid character indexing
tail(rowSums(dfm_fan_weighted))
## text22 text23 text24 text25 text26 text27 
##     NA     NA     NA     NA     NA     NA
tail(dfm_fan_weighted)[, c("beau")]
## Document-feature matrix of: 6 documents, 1 feature (100% sparse).
## 6 x 1 sparse Matrix of class "dfm"
## features 
## docs     beau
## text22    0
## text23    0
## text24    0
## text25    0
## text26    0
## text27    0 

Any idea to fix it? I think the code needs just some small changes to work properly.

Edit: I edited the code following Ken Benoit comment.

Tristan G
  • 1
  • 2
  • I just edited the [previous ANEW answer](https://stackoverflow.com/questions/44132313/can-the-anew-dictionary-be-used-for-sentiment-analysis-in-quanteda) to reflect changes made in quanteda v1.0.0. Please see that question. – Ken Benoit Mar 24 '18 at 17:59
  • Thanks for your prompt reply. I still have some issues when I try to obtain the dfm `tail(dfm_fan_weighted)[, c("absent", "politique")] Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) : invalid character indexing` or to generate total scores (still NA values). – Tristan G Mar 24 '18 at 18:40

0 Answers0