0

I'm a beginner to R, but I have the below dataframe with more observations in which I have at max each 'id' observation for three years 91, 99, 07.

I want to create a variable avg_ln_rd by 'id' that takes the average of 'ln_rd' and 'ln_rd' from year 91 if the first ln_rd observation is from 99 - and from year 99 if the first ln_rd observation is from 07.

      id  year ln_rd
   <dbl> <dbl> <dbl>
 1  1013  1991 3.51 
 2  1013  1999 5.64 
 3  1013  2007 4.26 
 4  1021  1991 0.899
 5  1021  1999 0.791
 6  1021  2007 0.704
 7  1034  1991 2.58 
 8  1034  1999 3.72 
 9  1034  2007 4.95 
10  1037  1991 0.262

I also already dropped any observations of 'id' that only exist for one of the three years.

My first thought was to create for each year a standalone variable for ln_rd but then i still would need to filter by id which i do not know how to do.

Then I tried using these standalone variables to form an if clause.

df$lagln_rd_99 <- ifelse(df$year == 1999, df$ln_rd_91, NA)

But again I do not know how to keep 'id' fixed.

Any help would be greatly appreciated.

EDIT:

I grouped by id using dplyr. Can I then just sort my df by id and create a new variable that is ln_rd but shifted by one row?

lex_022
  • 3
  • 2
  • Following the logic is it right that condition 1 (first 1999 and 1991 present) is not found in the given data set? – Andre Wildberg Jan 11 '23 at 17:08
  • @AndreWildberg sorry maybe i phrased it poorly - in the end i want to create for each possible id-year combination for the years 99 and 07 an average as follows avg_99 = (x_99+x_91)/2 or avg_07 = (x_07+x_99)/2 – lex_022 Jan 11 '23 at 17:18
  • If for a given `id` the first `ln_rd` observation is from 1999, how do we know what the value from 1991 would have been? – Jon Spring Jan 11 '23 at 19:45
  • It might help if you could show some examples of desired output and what calculation in the given data gets you there. It might be simpler if you run `library(dplyr); df %>% tidyr::complete(id, year) %>% arrange(id, year)` and use that output as your starting point. Are you trying to impute missing values by averaging the others or by filling from the prior on, or something else? – Jon Spring Jan 11 '23 at 19:48
  • i edited it again, sorry - i used your code to make my data more clear. I have for each 'id' 3 'ln_rd' values - for the three years. What I want is to average consecutive year pairs for each 'id' which are 91 & 99 and 99 & 07. I just try to create a variable that averages across two periods, because I need that variable further on. – lex_022 Jan 11 '23 at 20:26
  • as output i would want a column to my df that would be NA for rows in year=1991 and yield averages in rows in year=1999 and year=2007 – lex_022 Jan 11 '23 at 20:28

1 Answers1

1

Still a bit unclear what to do if all years are present in a group but this might help.

-- edited -- to show the desired output.

library(dplyr)

df %>% 
  group_by(id) %>% 
  arrange(id, year) %>% 
  mutate(avg91 = mean(c(ln_rd[year == 1991], ln_rd[year == 1999])), 
         avg99 = mean(c(ln_rd[year == 1999], ln_rd[year == 2007])), 
         avg91 = ifelse(year == 1991, avg91, NA), 
         avg99 = ifelse(year == 2007, avg99, NA)) %>% 
  ungroup()
# A tibble: 15 × 5
    year    id  ln_rd  avg91 avg99
   <int> <int>  <dbl>  <dbl> <dbl>
 1  1991  3505 3.38    3.09  NA
 2  1999  3505 2.80   NA     NA
 3  1991  4584 1.45    1.34  NA
 4  1999  4584 1.22   NA     NA
 5  1991  5709 1.90    2.13  NA
 6  1999  5709 2.36   NA     NA
 7  2007  5709 3.11   NA      2.74
 8  2007  9777 2.36   NA      2.36
 9  1991 18729 4.82    5.07  NA
10  1999 18729 5.32   NA     NA
11  2007 18729 5.53   NA      5.42
12  1991 20054 0.588   0.307 NA
13  1999 20054 0.0266 NA     NA
14  1999 62169 1.91   NA     NA
15  2007 62169 1.45   NA      1.68
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
  • thank you so much! If you just leave out the "any" it works exacly as i wanted it to -> avg91 = ifelse(year == 1991, avg91, NA), avg99 = ifelse(year == 2007, avg99, NA)) This way it only sets a value in the respective year row. Thank you again! – lex_022 Jan 11 '23 at 21:49
  • @lex_022 Alrigthy, I edited to reflect the desired output. – Andre Wildberg Jan 11 '23 at 22:05