-1

I am new to R and very stuck on a problem which I've tried to solve in various ways.

I have data I want to plot to a graph that shows twitter engagements per day.

To do this, I need to merge all the 'created at' rows, so there is only one data per row, and each date has the 'total engagements' assigned to it.

This is the data:

enter image description here

So far, I've tried to do this, but can't seem to get the grouping to work.

I mutated the data to get a new 'total engage' column:

lgbthm_data_2 <- lgbthm_data %>%
  mutate(
    total_engage = favorite_count + retweet_count
  ) %>%

Then I've tried to merge the dates:

only_one_date <- lgbthm_data_2 %>%
  group_by(created_at) %>%
 summarise_all(na.omit)

But no idea!

Any help would be great

Thanks

  • 2
    Suggestion for the next time: Images of data are **not** data.. Pleae post relevant sample data using the returns from running `dput(mydata)` or, if this is too large, someething like `dput(head(mydata, 20))`. This will probably result in better/faster/more relevant answers, and also will prevent downvotes. – Wimpel Apr 04 '22 at 08:56
  • Oops - thanks - noted for next time! – rhenderson Apr 05 '22 at 16:44

2 Answers2

0

You are looking for:

library(dplyr)
only_one_date <- lgbthm_data_2 %>%
  group_by(created_at) %>%
  summarise(n = n())

And there is even a shorthand for this in dplyr:

only_one_date <- lgbthm_data_2 %>%
  count(created_at)

group_by + summarise can be used for many things that involve summarising all values in a group to one value, for example the mean, max and min of a column. Here I think you simply want to know how many rows each group has, i.e., how many tweets were created in one day. The special function n() tells you exactly that.

From experience with Twitter, I also know that the column created_at is usually a time, not a date format. In this case, it makes sense to use count(day = as.Date(created_at)) to convert it to a date first.

JBGruber
  • 11,727
  • 1
  • 23
  • 45
  • Hi there! Great thank you - that makes a lot of sense. Is there a way to do this, while retaining the screen_name column? Essentially, I have 3 datasets for different Twitter accounts, and I'm trying to plot them onto one chart which shows the number of total engagements per day for each for the month of February. So what I'm really struggling with is how to get total engagements per day, plus screenname, into one table. So what the above creates is perfect to get the total engagements per day, but I also want screenname, that's in the original dataset, to be included. – rhenderson Apr 05 '22 at 16:56
  • I would first combine the datasets with `bind_rows()`. If you want to get engagements per day per screen name, you can use screen_name as a second grouping variable. So `group_by(created_at, screen_name)` or `count(created_at, screen_name)`. – JBGruber Apr 06 '22 at 07:36
  • 1
    On a side note: it's considered bad practice to change the target of a question after you've already received some answers. If you have further questions, it's a good idea to accept an answer (and upvote it if you found it useful) and then post a follow-up question. – JBGruber Apr 06 '22 at 09:55
0
library(tidyverse)

data <- tribble(
  ~created_at, ~favorite_count, ~retweet_count,
  "2022-02-01", 0, 2,
  "2022-02-01", 1, 3,
  "2022-02-02", 2, NA
)

summary_data <-
  data %>%
  type_convert() %>%
  group_by(created_at) %>%
  summarise(total_engage = sum(favorite_count, retweet_count, na.rm = TRUE))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   created_at = col_date(format = "")
#> )
summary_data
#> # A tibble: 2 × 2
#>   created_at total_engage
#>   <date>            <dbl>
#> 1 2022-02-01            6
#> 2 2022-02-02            2

qplot(created_at, total_engage, geom = "col", data = summary_data)

Created on 2022-04-04 by the reprex package (v2.0.0)

danlooo
  • 10,067
  • 2
  • 8
  • 22
  • Thanks so much! This works perfectly but what I should've explained is this is part of my plan to create a plot with various datasets. So what I'm trying to do, is have a dataset which has: screen_name, total_engagements (so the favourites and retweets columns merge) and then also have the data for one date merged into one - so all the 'total_engagements' for any particular day will be added together. I can't get all three of these things to work at the same time as I'm struggling to retain 'screen_name' in anything I do. – rhenderson Apr 05 '22 at 17:03
  • Multiple datasets can be combined using [mutating joins](https://dplyr.tidyverse.org/articles/two-table.html#mutating-joins). However, there should be only one question per stack overflow post here. – danlooo Apr 06 '22 at 07:08