-1

I have a question that may be simple but have been having difficult times to find a solution for. I have a data for different companies and for different years that looks like this:

enter image description here

I would like to draw a kind of time series graph for all the companies in one single graph. The point is that I don't want to have 0 for missing values for the corresponding intervals. The result that I expect will have the dates as the X axis and values as the Y axis. So, for example, the result for the line for CompanyA will be a horizontal line which starts from 2001-02 and ends at 2001-06 at the height of 1000 (as Y value). I would like to visualize the intervals for different companies. I was trying to use ggplot2 in R to draw it but not really successful. Could anybody help me to draw this in R? I have more than 500 rows and more than 180 columns.

shiny
  • 3,380
  • 9
  • 42
  • 79
Mina
  • 51
  • 1
  • 1
  • 7
  • 1
    Hi Mina, can you please post a code snip-it so we can see what your working with? Thanks – Justin Jul 26 '17 at 20:41
  • In the future I recommend you to post your data (as in my answer) not a print screen to make it easy for others to help you. In addition as @Justin pointed out you should show the effort you did (the code you tried but was not successful as you mentioned in your question). – shiny Jul 26 '17 at 20:58
  • @Mina In your question, you mentioned you have different years but to me it is same year and just different days or months. Could you please clarify this? – shiny Jul 26 '17 at 21:01
  • @aelwan yes I have the data for 15 years in monthly scale. In your suggested answer you assumed the format is Y-m-d though... – Mina Jul 27 '17 at 02:06
  • @Mina Check my updated answer – shiny Jul 27 '17 at 02:32

2 Answers2

1

Are you looking for something like this?

library(tidyverse)
library(zoo)
df %>% 
  tidyr::gather("date", "value", 2:7) %>% 
  dplyr::filter(!is.na(value)) %>% 
  ggplot(., aes(x = as.factor(as.yearmon(date)), y = value, group = CompanyName, colour = CompanyName))+
    geom_line() +labs(x = "Date")

enter image description here

DATA

df <-  read.table(text =c("
CompanyA    NA  1000    NA  NA  NA  1000
CompanyB    600 NA  NA  NA  600 NA
CompanyC    NA  5000    NA  5000    NA  NA"), header = F) 

colnames(df) <-  c("CompanyName",   "2001-01",  "2001-02"   ,"2001-03", "2001-04",  "2001-05",  "2001-06")
shiny
  • 3,380
  • 9
  • 42
  • 79
  • Thank you @aelwan but here is the problem: my data is not in Y-m-d format... It is in Y-m format... – Mina Jul 27 '17 at 01:58
1

You have multiple issues here that are squarely in the area of "data wrangling". The biggest issue is to impute actual values into your missing value fields.

Luckily, the xts time series library contains functions to do this, as well as a function to plot multiple time series, which is your ultimate goal.

However, before we can use those wonderful functions, you will have to do some work transforming your data into an xts object.

First recreating your data above using the method of @aelwan.

    ```{r, tidy=TRUE}
    
    df <-  read.table(text = c(" 
    CompanyA    NA  1000    NA  NA  NA  1000
    CompanyB    600 NA  NA  NA  600 NA
    CompanyC    NA  5000    NA  5000 NA  NA"), 
                      header = F)   
    colnames(df) <-  c("CompanyName", "2001-01", "2001-02" ,"2001-03", "2001-04",  "2001-05", "2001-06")
   
    df
   
              CompanyName 2001-01 2001-02 2001-03 2001-04 2001-05 2001-06
            1    CompanyA      NA    1000      NA      NA      NA    1000
            2    CompanyB     600      NA      NA      NA     600      NA
            3    CompanyC      NA    5000      NA     500      NA      NA
    

Your data appears to be in wide format, so I would suggest transposing it to long format. This will require a few steps to retain important information such as column and row names, as well as the class of your data (numeric).

First, transpose the data frame

df_t <- t(df)

Now, save the first row, which now contains the company names.

 company_names <- df_t[1,]

The transpose process results in an object of class 'matrix'. Drop the first row and make df_t object class data.frame.

  df_t <- data.frame(df_t[-1, ], stringsAsFactors = FALSE)

Add the company names stored in "company_names" back as the column names

colnames(df_t) <- company_names

Your column data class might have been lost during the transpose as well, so convert all column to class numeric with the sapply function.

   df_long <- data.frame(sapply(df_t, FUN=as.numeric), row.names = rownames(df_t))
    
    # print the long form results
    df_long


        CompanyA CompanyB CompanyC
Jan 2001       NA      600       NA
Feb 2001     1000       NA     5000
Mar 2001       NA       NA       NA
Apr 2001       NA       NA     5000
May 2001       NA      600       NA
Jun 2001     1000       NA       NA

Now, convert your new df_long data.frame into a time series index based xts object to access the time series function you need.

{r}
library(xts)
    
# convert rownames "2001-01, 2001-02, ..." to yearmon format
rownames(df_long) <- as.yearmon(rownames(df_long), "%Y-%m")
    
# pass the dates as an index to the xts via the `order.by` argument.
df_xts <- xts(df_long , order.by = as.yearmon(rownames(df_long)))

Finally, we can use the "Last Observation Carried Forward" function, na.locf in the xts package to fill in the dates.

{r}
df_locf <- na.locf(df_xts)
    
df_locf
         CompanyA CompanyB CompanyC
Jan 2001       NA      600       NA
Feb 2001     1000      600     5000
Mar 2001     1000      600     5000
Apr 2001     1000      600     5000
May 2001     1000      600     5000
Jun 2001     1000      600     5000

When calling the plot function on objects of class xts, multivariate time series plots are produced easily.

{r}
# The plot function works.
plot(df_locf)

xts plot of na.locf filled series

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Justin
  • 1,360
  • 12
  • 15
  • @Justin thank you very much for your very detailed explanation. This solution is exactly what I need. I tried to apply your procedure to my data but in performing rownames(df_long) step it shows an error for duplicate 'row.names' are not allowed ... I checked and there is not this duplication in my data... Could you please help me to get out of this situation? I am adding my data set to my question if it helps... – Mina Jul 27 '17 at 01:57
  • If retaining the dates is less important, a solution would be to skip the rownames(df_long) step and modify the second step in that subsection to `read df_xts <- xts(df_long , order.by = index(df_long))`. This should create your xts obj – Justin Jul 29 '17 at 21:52
  • @Mina Hope this solution worked out for you! I noticed your newer to stack overflow, and I thought I'd mention that if you have a satisfactory solution, its best to mark it as answered. This way others in the community don't have to take the time to read through all of the posts, and can answer other questions. Thanks – Justin Aug 09 '17 at 14:47