0

I have a survey composed of n individuals; each individual is present more than one time in the survey (panel). I have a variable pens, which is a dummy that takes value 1 if the individual invests in a complementary pension form. For example:

df <- data.frame(year=c(2002,2002,2004,2004,2006,2008), id=c(1,2,1,2,3,3), y.b=c(1950,1943,1950,1943,1966,1966), sex=c("F", "M", "F", "M", "M", "M"), income=c(100000,55000,88000,66000,12000,24000), pens=c(0,1,1,0,1,1))

year  id  y.b   sex   income   pens   
2002  1   1950   F    100000     0     
2002  2   1943   M    55000      1    
2004  1   1950   F    88000      1    
2004  2   1943   M    66000      0    
2006  3   1966   M    12000      1    
2008  3   1966   M    24000      1    

where id is the individual, y.b is year of birth, pens is the dummy variable regarding complementary pension.

I want to know if there are individuals that invested in a complementary pension form in year t but didn't hold the complementary pension form in year t+2 (the survey is conducted every two years). In this way I want to know how many person had a complementary pension form but released it before pension or gave up (for example for economic reasons).

I tried with this command:

df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")

and actually I have the individuals whose pens variable had changed during time (the command check if a variable is constant in time). For this reason I find individuals whose pens variable changed from 0 (didn't have complementary pension) in year t to 1 in year t+2 and viceversa; but I am interested in individuals whose pens variable was 1 (had a complementary pensione) in year t and 0 in year t+2.

If I use this command with the df I get that for id 1 and 2 the variable x is 0 (pens variable isn't constant), but I'd need to find a way to get just id 2 (whose pens variable changed from 1 to 0).

df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")

  year id pens x
1 2002  1    0 0
2 2002  2    1 0
3 2004  1    1 0
4 2004  2    0 0
5 2006  3    1 1
6 2008  3    1 1

(for the sake of semplicity I omitted other variables)

So the desired output is:

  year id pens x
1 2002  1    0 1
2 2002  2    1 0
3 2004  1    1 1
4 2004  2    0 0
5 2006  3    1 1
6 2008  3    1 1

only id 2 has x=0 since the pens variable changed from 1 to 0.

Thanks in advance

Laura R.
  • 99
  • 1
  • 10
  • What is your expected output? – akrun Oct 11 '16 at 09:34
  • In the future please show your input and expected output in reproducible form. For the input this time I have done it for you in the Note at the end of my answer. See [mcve] for more info. – G. Grothendieck Oct 11 '16 at 09:51

1 Answers1

1

This assigns 1 to the id's for which there is a decline in pens and 0 otherwise.

transform(d.d, x = ave(pens, id, FUN = function(x) any(diff(x) < 0)))

giving:

  year id  y.b sex income pens   x
1 2002  1 1950   F 100000    0   0
2 2002  2 1943   M  55000    1   1
3 2004  1 1950   F  88000    1   0
4 2004  2 1943   M  66000    0   1
5 2006  3 1966   M  12000    1   0
6 2008  3 1966   M  24000    1   0

This should work even even if there are more than 2 rows per id but if we knew there were always 2 rows then we could omit the any simplifying it to:

transform(d.d, x = ave(pens, id, FUN = diff) < 0)

Note: The input in reproducible form is:

Lines <- "year  id  y.b   sex   income   pens   
2002  1   1950   F    100000     0     
2002  2   1943   M    55000      1    
2004  1   1950   F    88000      1    
2004  2   1943   M    66000      0    
2006  3   1966   M    12000      1    
2008  3   1966   M    24000      1"

d.d <- read.table(text = Lines, header = TRUE, check.names = FALSE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you @G. Grothendieck this is the answer I was looking for. Thank you also for the advice, I already edited the question. – Laura R. Oct 11 '16 at 10:03