1

I have looked at this question but it seems to me it addresses a different issue: Find a numeric pattern R

I have a large data set with multiple observations per id. Observation length (in m example below t) can vary across ids. I want to find patterns as defined by a subject making the same type of decision (below as type) at least three times in a row. My data looks like so:

id <- rep(1:3, each = 5)
t <- rep(1:5, 3)
type <- c("familiar", "familiar", "new", "completely new", "new", "new", "new", "new","new","new","new", "familiar", "completely new", "familiar", "new")
n <- data.frame( id, t, type )
n

How can I find patterns and indicate in a new column how many times they have made the decision to choose a certain type at least three times in a row?

(edit:) My desired output would be a value indicating the type a certain type was chosen at least three times in a row, e.g. something along the lines of "familiar_3+" or "new_3+".

Any help is much appreciated.

  • 1
    What is your expected output? – Cettt Feb 23 '22 at 15:53
  • Do you want `library(dplyr);library(data.table);n %>% group_by(id, typegrp = rleid(type)) %>% summarise(n = n()) %>% group_by(id) %>% summarise(n = sum(n >=3)) %>% left_join(n)` – akrun Feb 23 '22 at 15:55
  • Thank you two for your quick responses. I have edited the initial post – honeyimhome Feb 23 '22 at 16:37
  • 1
    @honeyimhome instead of `something along the lines of "familiar_3+" or "new_3+".`, please update your post with the actual expected output for the input you showed to avoid any confusions – akrun Feb 23 '22 at 16:39

1 Answers1

1

If the number of consecutive observations is fixed, you can do this using lag (or lead). Something like the following:

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(prev1_type = lag(type, 1, order_by = t),
         prev2_type = lag(type, 2, order_by = t)) %>%
  mutate(consec_x3 = ifelse(type == prev1_type & type == prev2_type, "yes", "no"))
Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • Thank you Simon, the t can vary across different ids. Do you think there is a possibility to account for that too? – honeyimhome Feb 24 '22 at 09:29
  • I ran your command with the example data but it results in NAs only. I believe there should be a ")" at the end of the second mutate. – honeyimhome Feb 24 '22 at 09:59
  • Good catch on the missing bracket. I have also fixed a typo is should be `order_by = t` instead of `order_by = "t"`. – Simon.S.A. Feb 24 '22 at 19:51
  • Right now column `t` is only used to sort/order the rows. So the exact values in this column do not matter (so long as they can be sorted). If some IDs have a repeated `t` value then the results will not be stable and could vary between runs. If you also want to require that the `t` are consecutive, add a lag for t ( `prev1_t = lag(t, 1, order_by = t)` ) and check for consecutive periods ( `t == prev1_t + 1` ). – Simon.S.A. Feb 24 '22 at 19:54
  • I think this is a nice way to do what I want. Thank you very much for your help. Much appreciated. – honeyimhome Feb 25 '22 at 09:20
  • I would love to do that though I do need 15 reputation to upvote on questions and I currently only have 11... – honeyimhome Mar 04 '22 at 12:16