2

I am using panel data and have some discrepancies in the age variable. For some respondents, their age increase or decrease by more than 1 from one year to another as we can see for respondents with ID number 2 and 3 below. This could be due to issues of data entries or other reasons that I cannot solve myself.

Could someone please guide me with how to create a new variable that detects people who either had an increase in their age by more than 1 value or had a decrease in their age from one year to another as it happens in ID 2 and 3 below?

id  age year
1   25  2005
1   26  2006
1   27  2007
2   50  2006
2   51  2007
2   36  2008
3   25  2005
3   30  2006


structure(list(id = structure(c(1, 1, 1, 2, 2, 2, 3, 3), format.stata = "%9.0g"), 
    age = structure(c(25, 26, 27, 50, 51, 36, 25, 30), format.stata = "%9.0g"), 
    year = structure(c(2005, 2006, 2007, 2006, 2007, 2008, 2005, 
    2006), format.stata = "%9.0g")), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))
Jack
  • 813
  • 4
  • 17

1 Answers1

2

You can use group_by and lag to check changes.

df %>%
  arrange(id, year) %>%
  group_by(id) %>%
  mutate(
    age_change = age - lag(age),
    age_bigincrease = age_change > 1,
    age_decrease = age_change < 0
  )

would return

# A tibble: 8 x 6
# Groups:   id [3]
     id   age  year age_change age_bigincrease age_decrease
  <dbl> <dbl> <dbl>      <dbl> <lgl>           <lgl>       
1     1    25  2005         NA NA              NA          
2     1    26  2006          1 FALSE           FALSE       
3     1    27  2007          1 FALSE           FALSE       
4     2    50  2006         NA NA              NA          
5     2    51  2007          1 FALSE           FALSE       
6     2    36  2008        -15 FALSE           TRUE        
7     3    25  2005         NA NA              NA          
8     3    30  2006          5 TRUE            FALSE  
Daniel R
  • 1,954
  • 1
  • 14
  • 21