2

I have data in a long format, similar to the following

id <- c(rep(c(1L,2L,3L),3))

year <- c(rep(c(11,12,13),3))

df <- data.frame(id, year)[-c(8,3),]

df$factor <- factor(c("a", "b", "a", "c", "d","a","d"))

df

I would like to create an indicator variable that takes a value when the factor has changed (e.g. 1 for a change, 0 for no change), on the year the change appears. Is there an efficient way of doing this?

I found this question: Identifying where value changes in R data.frame column which is somewhat related but does not deal with the ids.

SushiChef
  • 588
  • 3
  • 12

1 Answers1

1

Probably, you are looking for :

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(flag = factor != lag(factor, default = first(factor)))

#    id  year factor flag 
#  <int> <dbl> <fct>  <lgl>
#1     1    11 a      FALSE
#2     2    12 b      FALSE
#3     1    11 a      FALSE
#4     2    12 c      TRUE 
#5     3    13 d      FALSE
#6     1    11 a      FALSE
#7     3    13 d      FALSE

and in data.table :

library(data.table)
setDT(df)[, flag := factor != shift(factor, fill = first(factor)), id]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213