0

I have subsetted a set of data from the consumer complaints database. However, I'm having a hard time transforming it into a time series, especially since there are same issues being reported at the same time frame (not unique). My end goal is to compare the frequency of an issue against a time frame organized by month in a line plot.

Here are the first 5 rows of the subsetted data.frame from a total of over 750,000 entries:

Date        Issue 
08/25/14    Making/receiving payments, sending money None   
04/20/17    Other       
02/14/14    Billing disputes
08/30/13    Managing the loan or lease  
10/03/14    Billing disputes    
01/07/13    Billing disputes
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • What have you tried that doesn't work? You're describing problems that won't be present in just these 6 rows of data, so it's hard to know how you've tried dealing with them – camille Nov 19 '19 at 05:16

1 Answers1

1

Something like this?

df <- data.frame(stringsAsFactors=FALSE,
              Date = sample(c("08/25/14", "04/20/17", "02/14/14", "08/30/13", "10/03/2014",
                       "1/07/2013"), 100, replace = TRUE),
             Issue = sample(c("Making/receiving", "Other", "Billing", "Managing", "Billing",
                       "Billing"), 100, replace = TRUE)
      )

library(lubridate)
library(dplyr)
library(ggplot2)

df <- df %>% 
    mutate(
        Date = mdy(Date),
        Year = year(Date),
        Month = month(Date),
        Period = make_date(Year, Month, 1)
    ) %>% 
    group_by(Period, Issue) %>% 
    summarise(
        incidents = n()
    ) 

ggplot() +
    geom_path(data = df, mapping = aes(x = Period, y = incidents, colour = Issue))

Created on 2019-11-19 by the reprex package (v0.3.0)

Simon Woodward
  • 1,946
  • 1
  • 16
  • 24