I'm looking at the different weighting options using the dfm_weight. If I select scheme = 'prop' and I group textstat_frequency by location
, what's the proper interpretation of a word in each group?
Say in New York the term career
is 0.6 and and in Boston the word team
is 4.0, how can I interpret these numbers?
corp=corpus(df,text_field = "What are the areas that need the most improvement at our company?") %>%
dfm(remove_numbers=T,remove_punct=T,remove=c(toRemove,stopwords('english')),ngrams=1:2) %>%
dfm_weight('prop') %>%
dfm_replace(pattern=as.character(lemma$first),replacement = as.character(lemma$X1)) %>%
dfm_remove(pattern = c(paste0("^", stopwords("english"), "_"), paste0("_", stopwords("english"), "$")), valuetype = "regex")
freq_weight <- textstat_frequency(corp, n = 10, groups = c("location"))
ggplot(data = freq_weight, aes(x = nrow(freq_weight):1, y = frequency)) +
geom_bar(stat='identity')+
facet_wrap(~ group, scales = "free") +
coord_flip() +
scale_x_continuous(breaks = nrow(freq_weight):1,
labels = freq_weight$feature) +
labs(x = NULL, y = "Relative frequency")