-1

I have a dataframe that looks like this

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   344479 obs. of  6 variables:
 $ REGION        : chr  "NSW1" "NSW1" "NSW1" "NSW1" ...
 $ SETTLEMENTDATE: POSIXct, format: "1998-12-07 02:00:00" "1998-12-07 
02:30:00" "1998-12-07 03:00:00" "1998-12-07 03:30:00" ...
 $ TOTALDEMAND   : num  3294 5337 5296 5266 5330 ...
 $ RRP           : num  8.01 11.16 13.52 12.52 13.01 ...
 $ PERIODTYPE    : chr  "TRADE" "TRADE" "TRADE" "TRADE" ...
 $ month         : num  12 12 12 12 12 12 12 12 12 12 ...

I am attempting to create a year_quarter variable that is a string and has the form: 2014-Q1 (denoting year/quarter) by extracting these from the SETTLEMENTDATE variable.

There are a number of solutions to this problem via zoo or lubridate, but i'm hoping that somebody can tell me why my function doesn't work:

quarter_fun <- function(df){
    df$quarter <- NA
    if (df$month <= 3){
    df$quarter <- paste(format(df$SETTLEMENTDATE, format = "%Y")[1], 
"Q1", sep="-")
    } else if (df$month >= 4 & df$month <= 6){ 
      df$quarter <- paste( format(df$SETTLEMENTDATE, format = "%Y")[1], 
"Q2", sep="-")            
    } else if (df$month >= 7 & df$month <= 9){ 
      df$quarter <- paste(format(df$SETTLEMENTDATE, format = "%Y")[1], 
"Q3", sep="-")
    } else if (df$month == 10){ 
    df$quarter <- paste(format(df$SETTLEMENTDATE, format = "%Y")[1], 
"Q4", sep="-")
    }

}

I get this error message:

the condition has length > 1 and only the first element will be usedthe 
condition has length > 1 and only the first element will be usedthe 
condition has length > 1 and only the first element will be usedthe 
condition has length > 1 and only the first element will be used

Any help would be greatly appreciated - again this is less about finding a solution to the task at hand but rather it is about understanding why my attempt is not working as there is clearly a false assumption (or several) on my end somewhere along the way.

Thanks!

Davide Lorino
  • 875
  • 1
  • 9
  • 27

1 Answers1

1

You solution ignores the fact that df$month is a vector, and if needs to evaluate to a single true/false value. Your comparisons are producing logical vectors of true/false values. Hence the warning message that "only the first element will be used".

Instead, consider using cut to relabel the numeric months:

numeric.months <- 1:12
quarters <- cut(numeric.months, seq(0, 12, 3), labels = paste0('Q', 1:4), include.lowest = T)

 [1] Q1 Q1 Q1 Q2 Q2 Q2 Q3 Q3 Q3 Q4 Q4 Q4
Levels: Q1 Q2 Q3 Q4
jdobres
  • 11,339
  • 1
  • 17
  • 37
  • Thank you for a really helpful explanation and alternative solution! I am happy with your way of achieving this goal, and the various other ways provided by packages such as lubridate and zoo. Is there a way to modify my function in some way so that it performs the desired operation? For instance, do I need to wrap the 'if' statements in some sort of for loop in order to vectorize the function? – Davide Lorino Sep 03 '18 at 22:34