the following problem occured in my work with a longitudinal dataframe.
There are several waves (years) in the study. In every wave there is an identification number (ID) for the same person and in addition some variables of interest that I want to bring together in one dataframe.
for example:
wave_1 <- data.frame(ID=c(1, 2, 3, 4, 5, 6),
income_wave_1=c(109, 106, 67, NA, 65, 190))
wave_2 <- data.frame(ID=c(1, 2, 3, 4, 5, 6),
income_wave_2=c(NA, 120, 34, 76, 69, 160))
wave_3 <- data.frame(ID=c(1, 2, 3, 4, 5, 6),
income_wave_3=c(34, 64, 78, NA, 170, 200))
at the end I want to calculate the mean in income of every person from wave_1, wave_2, wave_3. In every wave there are over 500 variables meassured for every person (ID). So I just want to extract the ID and the variable of interest (income). Untill now I did it manually via:
wave_1 <- read_dta("C:/User/all_waves/wave1.dta")
wave_2 <- read_dta("C:/User/all_waves/wave2.dta")
wave_3 <- read_dta("C:/User/all_waves/wave3.dta")
wave_1 <- wave_1 %>% select(ID, income_wave_1)
wave_2 <- wave_2 %>% select(ID, income_wave_2)
wave_3 <- wave_3 %>% select(ID, income_wave_3)
df_list_all_waves <- list(wave_1, wave_2, wave_3)
df_final_all_waves <- Reduce(function(x,y) full_join(x,y, all=TRUE), df_list_all_waves)
df_final <- df_final_all_waves %>% mutate(average_income_w1_to_w3 = rowMeans(df_final_all_waves [,2:4], na.rm=TRUE)) %>% select (ID,average_income_w1_to_w3 )
thx a lot