I'm attempting to perform sentiment analysis based on http://tidytextmining.com/sentiment.html#the-sentiments-dataset . Prior to performing sentiment analysis I need to convert my dataset into a tidy format.
my dataset is of form :
x <- c( "test1" , "test2")
y <- c( "this is test text1" , "this is test text2")
res <- data.frame( "url" = x, "text" = y)
res
url text
1 test1 this is test text1
2 test2 this is test text2
In order to convert to one observation per row require to process text column and add new columns that contains word and number of times it appears for that url. Same url will appear in multiple rows.
Here is my attempt :
library(tidyverse)
x <- c( "test1" , "test2")
y <- c( "this is test text1" , "this is test text2")
res <- data.frame( "url" = x, "text" = y)
res
res_1 <- data.frame(res$text)
res_2 <- as_tibble(res_1)
res_2 %>% count(res.text, sort = TRUE)
which returns :
# A tibble: 2 x 2
res.text n
<fctr> <int>
1 this is test text1 1
2 this is test text2 1
How to count words in res$text dataframe and maintain url in order to perform sentiment analysis ?
Update :
x <- c( "test1" , "test2")
y <- c( "this is test text1" , "this is test text2")
res <- data.frame( "url" = x, "text" = y)
res
res %>%
group_by(url) %>%
transform(text = strsplit(text, " ", fixed = TRUE)) %>%
unnest() %>%
count(url, text)
returns error :
Error in strsplit(text, " ", fixed = TRUE) : non-character argument
I'm attempting to convert to tibble as this appears to be format required for tidytextmining sentiment analysis : http://tidytextmining.com/sentiment.html#the-sentiments-dataset