I have been attempting to merge cross sectional data sets which were acquired at different years from different people.
For data collection, most of the same questions were asked per year but some questions where added or deleted. Hence, there is some variables that match across datasets and some others that are not matching but are still important.
Something that might be important for you all to keep in mind is that there are different number of respondents per year. Hence not all variables have the same number of elements per matching variable.
For context I am attempting to merge three data sets. But I will illustrate my below examples with 2 of the 3 for simplicities sake.
I have tried the match() function with all = TRUE but the data set I created by using this function made 3 vectors off of the vector I wanted stacked. e.g.
internet.x internet.y internet.z
3 3 7
6 4 5
I have also tried the rbind() function from the plyr package but this mode of merging deletes the columns that do not have matching elements.
So for example, since data: year2017 and data:year2018 both have a variable titles YEAR e.g.
data:year2017 data:year2018
YEAR YEAR
2017 2018
2017 2018
2017 2018
2017 2018
2017 2018
2017 2018
2017 2018
2017 2018
The YEAR variable gets deleted in the merging product because the same variable has different values or elements within different datasets.
So... what I want to keep in the finalized product is a merged result of
data:MERGED
YEAR
2017
2017
2017
2017
2017
2017
2017
2017
2018
2018
2018
2018
2018
2018
2018
2018
Another example is the following variable = var1 which is named the same across data sets
data:year2016 data:year2017 data:year2018
var1 var1 var1
3 5 2
2 3 1
4 7 7
5 8 3
6 3 4
The resulting product ideally would be
data:MERGEDFINAL
var1
3
2
4
5
6
5
3
7
8
3
2
1
7
3
4
What I want there to happen is that for all variables that are the same across data sets there should be a stacking action conducted. For the variables that are not the same, then the stacking should still occur but be packed with NA's for the respondents who took the survey in the year wherein there was no data collected for that variable.
If you all could put your brain power and experience together and help me out with this one that would be great :):):)