1

Here is my dataset: https://app.box.com/s/x5eux7mhdc0geyck4o47ttmpynah0wqk

Snapshot:

enter image description here

I'd like a create a data frame in which the average value of the sentiments would be present in a group of 2 months.

I tried the following code:

sentiment_dataset$created_at <- ymd_hms(sentiment_dataset$created_at)

sentiment_time <- sentiment_dataset %>% 
  group_by(created_at = cut(created_at, breaks="2 months")) %>%
          summarise(negative = mean(negative),
                    positive = mean(positive)) %>% melt

It gave the following error:

Using created_at as id variables Error in match.names(clabs, names(xi)) : names do not match previous names

user709413
  • 505
  • 2
  • 7
  • 21

2 Answers2

1

I'm not sure you can create the grouping variable in the group_by statement. Looks like using mutate beforehand works, though.

library(dplyr)
library(tidyr)

sentiment_time <- sentiment_dataset %>%
  mutate(created_at = cut(created_at, breaks="2 months")) %>%
  group_by(created_at) %>%
  summarize(negative = mean(negative),
            positive = mean(positive)) %>%
  gather('sentiment', 'mean_value', negative, positive)
zack
  • 5,205
  • 1
  • 19
  • 25
  • Thanks, I'd like to melt. The goal is to have three columns -- timestamp, sentiment and mean value. – user709413 May 15 '18 at 14:50
  • I've edited the answer to include a `gather` statement from the `tidyr` package - pretty sure that does what you're looking for. – zack May 15 '18 at 14:54
  • Thanks. Is there any way to group by weekdays, i.e., assigning mean value for each day of the week? I'd like to chart the sentiment value over the days of the week. Need to answer question such as which day has most positive sentiment? – user709413 May 15 '18 at 15:20
  • 2
    @user709413 you just have to change your `group_by` variable. For example, `group_by(weekday = lubridate::wday(created_at, label = TRUE)` – JasonAizkalns May 15 '18 at 15:30
1

I'd checkout the tibbletime package:

library(tibbletime)
library(tidyverse)

sentiment_dataset %>%
  arrange(created_at) %>%
  as_tbl_time(index = created_at) %>%
  collapse_by("2 months", clean = TRUE) %>%
  group_by(created_at) %>%
  summarise(negative = mean(negative),
            positive = mean(positive))

# A time tibble: 48 x 3
# Index: created_at
   created_at          negative positive
   <dttm>                 <dbl>    <dbl>
 1 2010-09-01 00:00:00    0.143    1.43 
 2 2010-11-01 00:00:00    0.273    0.727
 3 2011-01-01 00:00:00    0.208    0.792
 4 2011-03-01 00:00:00    0.5      1.38 
 5 2011-05-01 00:00:00    0.25     0.75 
 6 2011-07-01 00:00:00    1        1    
 7 2011-09-01 00:00:00    0        1.5  
 8 2011-11-01 00:00:00    0.333    1    
 9 2012-01-01 00:00:00    0        0    
10 2012-03-01 00:00:00    0        1.6  
# ... with 38 more rows

Naturally, you may want to pipe a gather() command after that...for example:

sentiment_dataset %>%
  arrange(created_at) %>%
  as_tbl_time(index = created_at) %>%
  collapse_by("2 months", clean = TRUE) %>%
  group_by(created_at) %>%
  summarise(negative = mean(negative),
            positive = mean(positive)) %>%
  gather(sentiment, mean_sentiment, -created_at) %>%
  ggplot(., aes(created_at, mean_sentiment, color = sentiment)) +
  geom_point() +
  geom_line() +
  geom_smooth()

Line Plot

JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
  • I'd like to basically create a chart using time series data, so I'd like to melt the data set which will have 3 columns -- timestamp (grouped), sentiment type and mean value. – user709413 May 15 '18 at 14:52
  • Thanks a lot. This is what I was looking for, but I'm going to mark the other answer as correct since I'd not like to use `tibbletime`. Thanks again! – user709413 May 15 '18 at 14:59