R extracting multiple sequences within sessions & participants (RLE)

Question

I have a time series dataset where participants did a series of actions, which are identified with codes (approx. 1-25). Participants could repeat any action any number of times. I am trying to collapse any sequential, repeated instances of action #6. The problem is that the action can be repeated 1,2,3,4 times in a session, then they did other actions, and then action #6 is repeated 1, 2 times again. I need both the 4 and 2. Problem is that the session and participant are the same so it's hard to collapse appropriately (keeping both sequences).

I have tried this code, courtesy of data frame cumulative run length encoding in R:

x <- rle(full_data2$action_name)       ## run rle on the relevant column
new <- sequence(x$lengths)       ## create a sequence of the lengths values
new = as.data.frame(new)
full_data2$rle = new

This did create a column with sequences in the data. But I am struggling to extract just the top number of all sequences for each student, with no other variables I can use to collapse.

How can I collapse this so the highest number of all RLE sequences is retained, within the session? In the sample data, I need 13, 6, and 2. Here is the dput output for the sample data:


structure(list(student_id = c(3935850L, 3935850L, 3935850L, 3935850L, 
3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 
3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 
3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 3935850L, 
3935850L, 3935850L, 3935850L), act_time = structure(c(1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 6L, 6L, 7L, 8L, 9L, 
9L, 9L, 9L, 10L, 10L, 10L, 11L, 12L, 12L, 12L), .Label = c("2017-12-10 00:39:52", 
"2017-12-10 00:40:17", "2017-12-10 00:40:18", "2017-12-10 00:40:19", 
"2017-12-10 00:40:36", "2017-12-10 00:40:37", "2017-12-10 00:40:38", 
"2017-12-10 00:40:42", "2017-12-10 00:41:03", "2017-12-10 00:41:04", 
"2017-12-10 00:41:08", "2017-12-10 00:41:45"), class = "factor"), 
    code = c(25L, 19L, 25L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 6L, 6L, 19L, 25L, 6L, 6L, 6L, 6L, 6L, 6L, 19L, 25L, 
    6L, 6L), sequence = c(1L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 
    7L, 8L, 9L, 10L, 11L, 12L, 13L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 
    6L, 1L, 1L, 1L, 2L)), .Names = c("student_id", "act_time", 
"code", "sequence"), row.names = c(NA, -28L), class = "data.frame")

Please post data in a reproducible and copy&paste-able format (using e.g. `dput`). Screenshots of data/code are never a good idea because we can't easily extract information. — Maurits Evers, Mar 26 '19 at 01:25
Thank you for your response. Please find .csv here: https://www.dropbox.com/s/atitbf4iqt572mz/sample.csv?dl=0. I couldn't successfully output using dput. — Jeanne Sinclair, Mar 26 '19 at 02:43
Sorry but a lot of people (myself included) are loath to download data from 3rd party file hosters and cloud storage services. That's why SO guidelines recommend using `dput` (which should work in particular with custom R objects). That aside, I actually did follow your link but could not download any of the data. — Maurits Evers, Mar 26 '19 at 02:55
Sorry, Maurits, and thanks. After some troubleshooting I was able to get the dput output, above. — Jeanne Sinclair, Mar 26 '19 at 15:05
r u looking for `r <- rle(full_data2$code); r$lengths[r$values==6]`? And for each student, maybe `setDT(full_data2)[, {r <- rle(code); r$lengths[r$values==6L]}, by=.(student_id)]`? — chinsoon12, Mar 27 '19 at 00:57
@chinsoon12 that did the trick! thank you, thank you, thank you! — Jeanne Sinclair, Mar 27 '19 at 16:42
My only other question now is remerging that new dataframe back as a new column in the original dataset, preserving the order of the RLEs that were extracted....? Is it possible to amend @chinsoon12's code so it also includes act_time? I tried with adding r$time[full_data2$act_time] but that did not work. Again, many thanks! — Jeanne Sinclair, Mar 27 '19 at 16:52
Thank you for your help -- I just used "lead" and was able to get the highest values for each sequential, repeated action. ```df = full_data2 %>% filter(lead(rle)==1) ``` — Jeanne Sinclair, Mar 27 '19 at 20:50

R extracting multiple sequences within sessions & participants (RLE)

0 Answers0