T-test after running sentiment analysis

Question

I am trying to run a t-test after doing the sentiment analysis. I did the sentiment analysis, and grouped my data into two parts:

library(textdata)
afinn_dictionary <- get_sentiments("afinn")

news_tokenized <- full_data %>%
  unnest_tokens(word, full_article, to_lower = TRUE)
head(news_tokenized$word, 10) 
full_data$full_article[2]

word_counts_senti <- news_tokenized %>% 
  inner_join(afinn_dictionary)

head(word_counts_senti)

news_senti <- word_counts_senti %>% 
  group_by(partisan_media) %>% #group by partisan media
  summarize(sentiment = sum(value))

head(news_senti) 
#as a result, I got: c(1): -13194, c(2): -12321. Both group 1 and 2 were negative, but group 1's stories tend to use more negative words (have greater negative sentiment).
table(full_data$partisan_media) #there are 1866 articles in group 1 and 2174 articles in group 2

I am trying to see if the differences between groups 1 and 2 (two groups of partisan media) are statistically different by running a t-test. I'm using:

g1_senti = rnorm(1866, mean = -7.07074, sd = ) #group1
g2_senti = rnorm(2174, mean = -5.667433, sd = ) #group2
t.test(g1_senti, g2_senti)

The means are from "sentiment score of a group" divided by "number of articles of a group" But I wasn't sure what should be entered inside the parenthesis for the sd. Does anyone have an idea about this?

I am adding my data set here: https://www.mediafire.com/file/uei2e3tajvi7wao/eight.csv/file

I referred to https://statistics.berkeley.edu/computing/r-t-tests and just wanted to follow this instruction. Is it incorrect to use rnorm here? — monete, Dec 06 '21 at 18:10
rnorm creates randomly generated data. They use it to simulate some examples. If you're analyzing real data you just use the data you have. You don't need to simulate anything. — Dason, Dec 06 '21 at 18:26
I see--I will remove the rnorm function from my t-test. Thanks! This is a link to my data: https://www.mediafire.com/file/uei2e3tajvi7wao/eight.csv/file — monete, Dec 06 '21 at 18:37
I'm not sure if I am doing this correctly, but can I use: t.test(word_counts_senti$value ~ word_counts_senti$partisan_media) ? "word_counts_senti$value" represents the sentiment scores of each news articles and "word_counts_senti$partisan_media" is a grouping variable (that distinguishes each article into group 1 or 2) — monete, Dec 06 '21 at 19:59

T-test after running sentiment analysis

0 Answers0