I have a dataset with one column being full_text
that contains review text from an online website. I wanted to clean these reviews, by removing stop words and stemming and putting them back to their original format (having all stemmed words forming a sentence, i.e.: one row per review instead of having 1 stemmed word per row.)
I am attempting the following:
sw <- stop_words %>% filter(lexicon == "SMART")
for (j in 1:nrow(reviews_df)) {
nostopwords <- reviews_df[j,] %>% unnest_tokens(word, full_text) %>%
anti_join(sw, by = "word")
stemmed <- wordStem(nostopwords[ , "word"], language = "porter")
reviews_df[j, "stemmed_Description"] <- paste(stemmed, collapse = " ")
}
However, this new column stemmed_Description
does not look how I wanted. It didn't perform stemming and also it is not in "sentence" style but rather as a vector of strings c("word1", "word2", "word3")
.
How can I achieve a result of the style: "stemmedword1 stemmedword2 stemmedword3" ?
Current output:
full_text
1 pseudoindependence no one looking over your shoulder and youre free to use your own judgement to problem solve. they sometimes expect more than what a person can give. dont overwork yourself. the packages aint going no where!
stemmed_Description
1 c("pseudoindependence", "shoulder", "youre", "free", "judgement", "problem", "solve", "expect", "person", "give", "dont", "overwork", "packages", "ain't")