I am trying to run a t-test after doing the sentiment analysis. I did the sentiment analysis, and grouped my data into two parts:
library(textdata)
afinn_dictionary <- get_sentiments("afinn")
news_tokenized <- full_data %>%
unnest_tokens(word, full_article, to_lower = TRUE)
head(news_tokenized$word, 10)
full_data$full_article[2]
word_counts_senti <- news_tokenized %>%
inner_join(afinn_dictionary)
head(word_counts_senti)
news_senti <- word_counts_senti %>%
group_by(partisan_media) %>% #group by partisan media
summarize(sentiment = sum(value))
head(news_senti)
#as a result, I got: c(1): -13194, c(2): -12321. Both group 1 and 2 were negative, but group 1's stories tend to use more negative words (have greater negative sentiment).
table(full_data$partisan_media) #there are 1866 articles in group 1 and 2174 articles in group 2
I am trying to see if the differences between groups 1 and 2 (two groups of partisan media) are statistically different by running a t-test. I'm using:
g1_senti = rnorm(1866, mean = -7.07074, sd = ) #group1
g2_senti = rnorm(2174, mean = -5.667433, sd = ) #group2
t.test(g1_senti, g2_senti)
The means are from "sentiment score of a group" divided by "number of articles of a group" But I wasn't sure what should be entered inside the parenthesis for the sd. Does anyone have an idea about this?
I am adding my data set here: https://www.mediafire.com/file/uei2e3tajvi7wao/eight.csv/file