0

so this has been driving me mad and I would love if someone could help!

I have a dateset with 3 columns. Each column is filled with dates. Each date represents a post on a social media platform. For example, if 2 posts were posted to twitter on 2012-10-10, that date will be recorded twice in the twitter column.

My data looks a bit like this

I want to graph the distribution of each of these columns over time in a density plot.

I want time in months as my x axis.

I want relative frequency as my y axis....like a count of how many posts were on twitter that month. So for twitter on 2012-10-10 it would be 2.

And I want all the distributions on the same plot so I can compare them.

So far I have tried a bajillion things, but I can't seem to get all of the above on the same graph and its driving me mad!

I have the made density plots here:

A density plot I made

using the following code:

social_media_dates %>%
               ggplot( aes(x =`Facebook_dates`)) +
               geom_density(fill="#69b3a2", color="#e9ecef", alpha=0.8)+
               theme_bw()+
               scale_x_date(labels = date_format("%Y-%m"), breaks = date_breaks("3 months"), limits = c(as.Date("2016-12-01"), as.Date("2020-05-20"))) +
               labs(title = "Facebook posts over time")+
               xlab("month")+
               ylab("density")

But: I don't know how to] a) change the y axis into a count of number of posts b) combine the 3 plots on the same graph with the same axis

I'd ideally like something which looked like the ggridges plots:

example ggridges

Or just all 3 curves on the same graph.

I'm using ggplot and Rstudio for reference.

I've tried heaps of things but they just keep on failing! I'm thinking along the lines of having a "date" column with all possible dates in by graph, and making this my x axis. Then calculating the count of posts on each day in a count column.

Eg.

date | facebook_count | twitter_count | instagram_count

2018-02-01 | 3 | 4 | 10

2018-02-02 | 4 | 8 | 2

2018-02-03 | NA | 4 | 6

I've made a dataframe which looks like this, but all the plots I've tried it with have broken.

If anyone knows how to do this I would be so thankful!

Phil
  • 7,287
  • 3
  • 36
  • 66
Miriam
  • 11
  • 2

1 Answers1

0

the step you are missing is that you need to change your dataframe into long format

let's assume your data frame looks as follows

library(tidyverse)
library(scales)

df <- data.frame(fb= lubridate::ymd(c("2020-01-01","2020-01-02","2020-01-03", "2020-01-03")),
                      twi = lubridate::ymd(c("2020-01-05","2020-01-05","2020-01-6", "2020-01-09")),
                      insta = lubridate::ymd(c("2020-01-01","2020-01-02","2020-01-05", "2020-01-05"))
                      )

now change the data frame into long format:

df_long <- df %>% pivot_longer(everything())

and this can be plotted

df %>% ggplot( aes(x =value, color=name, fill= name)) +
  geom_density( alpha=0.8)+
  theme_bw()+
  scale_x_date(labels = date_format("%Y-%m"), 
               breaks = date_breaks("3 months")) +
  labs(title = "Posts over time")+
  xlab("month")+
  ylab("density")

enter image description here

stefanH
  • 333
  • 1
  • 8
  • Thank you for your answer! This is beautiful! I actually just figured this out about 2h ago but its so nice to have confirmation! I just have one more question, how do I get my y axis to display a count of posts per month - do I need to choose another graph type to do this, because at that point is it no longer a density curve? – Miriam May 28 '20 at 03:35
  • Okay, so I just figured out I can add y = after_stat(count) to make my density curves proportional on the y axis, but It's think its breaking the counts up by day so it has the number of posts per day on the y axis..do you know if there is any way to make it by month? – Miriam May 28 '20 at 03:57