2

I am having some problems using the lag function in dplyr. This is my dataset.

ID <- c(100, 100, 100, 200, 200, 300, 300)
daytime <- c("2010-12-21 06:00:00", "2010-12-21 09:00:00", "2010-12-21 13:00:00 ", "2010-12-23 23:00:00", "2010-12-24 02:00:00", "2010-12-25 19:00:00", "2010-12-31 08:00:00")
lagfirstvisit <- c(0, 0, 2, 0, 1, 0, 0) 
table <- cbind(ID, daytime, lagfirstvisit) 
table <- as.data.frame(table)
table$daytime <- as.POSIXct(table$daytime)

My aim is to generate a new column with the lag of variable daytime by the number as indicated in the lagfirstvisit column. i.e. If lagfirstvisit == 2, I would want the lag2 daytime value of the particular ID. If lagfirstvisit == 0, it would mean to keep the observation row's original daytime value.

My expected result is as follow:

ID <- c(100, 100, 100, 200, 200, 300, 300)
daytime <- c("2010-12-21 06:00:00", "2010-12-21 09:00:00", "2010-12-21 13:00:00 ", "2010-12-23 23:00:00", "2010-12-24 02:00:00", "2010-12-25 19:00:00", "2010-12-31 08:00:00")
lagfirstvisit <- c(0, 0, 2, 0, 1, 0, 0) 
result <- c("2010-12-21 06:00:00", "2010-12-21 09:00:00", "2010-12-21 06:00:00", "2010-12-23 23:00:00", "2010-12-23 23:00:00", "2010-12-25 19:00:00", "2010-12-31 08:00:00")
table.results <- cbind(ID, daytime, lagfirstvisit, result) 

Currently, the code I am using is:

table <- table %>%  
group_by(ID) %>% 
mutate(result = lag(as.POSIXct(daytime, format="%m/%d/%Y %H:%M:%S", tz= "UTC"), n = as.integer(lagfirstvisit)))

However, I get the error:

Error in mutate_impl(.data, dots) : Evaluation error: n must be a non-negative integer scalar, not integer of length 3.

Does, anyone out there know how do I resolve this problem? Thank you very much!

Huicong
  • 65
  • 1
  • 7
  • Welcome to Stack Overflow! Please [format your code appropriately](https://meta.stackexchange.com/a/22189/371738). – jay.sf Jul 18 '18 at 07:50
  • 1
    As shown by your expected result, Is there a problem with `day - lagfirstvisit` – A. Suliman Jul 18 '18 at 07:54
  • Hi guys, sorry I updated my question to reflect what I actually want to figure out. Sorry for the blunder at first. – Huicong Jul 18 '18 at 08:18

3 Answers3

3
table.results %>%
  group_by(ID) %>%
  mutate(
    result2=mapply(`[`, list(day), row_number() - lagfirstvisit)
  )
# A tibble: 7 x 5
# Groups:   ID [3]
     ID   day lagfirstvisit result result2
  <dbl> <dbl>         <dbl>  <dbl>   <dbl>
1  100.   21.            0.    21.     21.
2  100.   22.            0.    22.     22.
3  100.   23.            2.    21.     21.
4  200.   12.            0.    12.     12.
5  200.   13.            1.    12.     12.
6  300.   19.            0.    19.     19.
7  300.   22.            0.    22.     22.
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • This could also be done with `purrr::map` or even `sapply`. – r2evans Jul 18 '18 at 08:04
  • Hi thanks! But i actually updated my question to reflect a change in the variable. Sorry, but would you mind helping me take a look at that as well? Thank you. Sorry I am new to stack overflow. – Huicong Jul 18 '18 at 08:22
  • How would changing from `day` to `daytime` affect the answer? It's just simple vector indexing. Had you tried it? – r2evans Jul 18 '18 at 15:00
1
    table%>%
      mutate_all(~as.numeric(as.character(.x)))%>%#First ensure all columns are numeric
      mutate(result=day[1:n()-lagfirstvisit])# you can also use row_number() instead of 1:n()

  ID day lagfirstvisit result
1 100  21             0     21
2 100  22             0     22
3 100  23             2     21
4 200  12             0     12
5 200  13             1     12
6 300  19             0     19
7 300  22             0     22

Caution: Refrain from using the inbuilt function names as variable names. eg, you are not supposed to use the name table as this is a function in base r

EDIT:

With the new data, the procedure remains the same, as long as the lagfirstvisit is numeric:

table%>%
   mutate(result=daytime[1:n()-as.numeric(as.character(lagfirstvisit))])
   ID             daytime lagfirstvisit              result
1 100 2010-12-21 06:00:00             0 2010-12-21 06:00:00
2 100 2010-12-21 09:00:00             0 2010-12-21 09:00:00
3 100 2010-12-21 13:00:00             2 2010-12-21 06:00:00
4 200 2010-12-23 23:00:00             0 2010-12-23 23:00:00
5 200 2010-12-24 02:00:00             1 2010-12-23 23:00:00
6 300 2010-12-25 19:00:00             0 2010-12-25 19:00:00
7 300 2010-12-31 08:00:00             0 2010-12-31 08:00:00
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Hi thanks! But i actually updated my question to reflect a change in the variable. Sorry, but would you mind helping me take a look at that as well? Thank you. Sorry I am new to stack overflow. – Huicong Jul 18 '18 at 08:17
1

I think this is a little cleaner than the current answers:

table %>%
  group_by(ID, lagfirstvisit) %>%
  mutate(result = dplyr::lag(daytime, n = lagfirstvisit[1])) %>%
  ungroup()

Since it's grouped lagfirstvisit all the indexes are the same, so taking the first works ok.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158