1

Have a structure long data set. Where unique IDs enter a time span of 2-3 weeks. The data is is in long format. The unique IDs enter the data set at a different time span. I would like to create a unique ID per time span.

df <- data.frame(id = rep(c("1","2","3","1"), each=2),
            counter=c(1,2,1,2,1,2,1,2),
            date_t=rep(seq(c(ISOdate(2021,3,20,9,7)), by = "day", length.out = 2),times=4),
            task=c("A","B","A","B","A","B","A","B"), stringsAsFactors=FALSE)

This the expected output: enter image description here

Peter
  • 11,500
  • 5
  • 21
  • 31
Chinwally
  • 11
  • 2
  • I just cannot figure out the relationships between the variables. Would you please explain a little more. – Anoushiravan R Apr 11 '21 at 23:02
  • 1
    ID represent unique patient identifier. There are 360 patients in this data base. There are multple row per patient. Each row represents an activity(task) completed for the patient. The task are completed up to a two week span but can take less. I placed a counter per activity. In this fictatious data base I only had two activities, represented as task "A" and "B". I can easly create a unique ID per patient name but since the patiets can enter the data base multiple times and start the process all over, task "A" and "B", I would like a unique ID per patient per entering the data base again. – Chinwally Apr 11 '21 at 23:18

1 Answers1

1

Is this what you are looking for?


library(dplyr)

df1 <- 
  df %>% 
  mutate(id_new = c(0, cumsum(abs(diff(as.numeric(id))))))

df1
#>   id counter              date_t task id_new
#> 1  1       1 2021-03-20 09:07:00    A      0
#> 2  1       2 2021-03-21 09:07:00    B      0
#> 3  2       1 2021-03-20 09:07:00    A      1
#> 4  2       2 2021-03-21 09:07:00    B      1
#> 5  3       1 2021-03-20 09:07:00    A      2
#> 6  3       2 2021-03-21 09:07:00    B      2
#> 7  1       1 2021-03-20 09:07:00    A      4
#> 8  1       2 2021-03-21 09:07:00    B      4

data

df <- data.frame(id = rep(c("1","2","3","1"), each=2),
                 counter=c(1,2,1,2,1,2,1,2),
                 date_t=rep(seq(c(ISOdate(2021,3,20,9,7)), by = "day", length.out = 2),times=4),
                 task=c("A","B","A","B","A","B","A","B"), stringsAsFactors=FALSE)

Created on 2021-04-12 by the reprex package (v2.0.0)

Peter
  • 11,500
  • 5
  • 21
  • 31