-1

I got a CSV file that look like this:

enter image description here

what i want to get is this:

enter image description here

the activity instance is needed to identifiy which events belong together and which not. this instance identifier should be unique, also among different cases and activities. I have no idea how to generate those ID's. Is there any library for example in python who could handle this?

1 Answers1

0

In R you could try the following using dplyr.

Using arrange you can ensure your data is by patient and in chronological order. Then the activity_instance will be a number incremented when the patient or activity changes going from row to row.

library(dplyr)

df %>%
  arrange(patient, timestamp) %>%
  mutate(activity_instance = 1 + cumsum(
    (patient != lag(patient, default = first(patient)) |
     activity != lag(activity, default = first(activity)))))
Ben
  • 28,684
  • 5
  • 23
  • 45