2

I have been attempting to merge cross sectional data sets which were acquired at different years from different people.

For data collection, most of the same questions were asked per year but some questions where added or deleted. Hence, there is some variables that match across datasets and some others that are not matching but are still important.

Something that might be important for you all to keep in mind is that there are different number of respondents per year. Hence not all variables have the same number of elements per matching variable.

For context I am attempting to merge three data sets. But I will illustrate my below examples with 2 of the 3 for simplicities sake.

I have tried the match() function with all = TRUE but the data set I created by using this function made 3 vectors off of the vector I wanted stacked. e.g.

    internet.x internet.y internet.z
        3          3         7 
        6          4         5

I have also tried the rbind() function from the plyr package but this mode of merging deletes the columns that do not have matching elements.

So for example, since data: year2017 and data:year2018 both have a variable titles YEAR e.g.

    data:year2017  data:year2018
    YEAR              YEAR
    2017              2018
    2017              2018
    2017              2018
    2017              2018
    2017              2018
    2017              2018
    2017              2018
    2017              2018

The YEAR variable gets deleted in the merging product because the same variable has different values or elements within different datasets.

So... what I want to keep in the finalized product is a merged result of

    data:MERGED
    YEAR
    2017              
    2017         
    2017         
    2017    
    2017
    2017         
    2017              
    2017
    2018              
    2018
    2018
    2018
    2018
    2018
    2018
    2018

Another example is the following variable = var1 which is named the same across data sets

    data:year2016  data:year2017   data:year2018  
    var1               var1             var1 
    3                   5                2 
    2                   3                1 
    4                   7                7 
    5                   8                3 
    6                   3                4 

The resulting product ideally would be

    data:MERGEDFINAL
    var1
     3
     2
     4
     5
     6
     5
     3
     7
     8
     3
     2
     1 
     7
     3
     4 

What I want there to happen is that for all variables that are the same across data sets there should be a stacking action conducted. For the variables that are not the same, then the stacking should still occur but be packed with NA's for the respondents who took the survey in the year wherein there was no data collected for that variable.

If you all could put your brain power and experience together and help me out with this one that would be great :):):)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
JFC_Mx
  • 23
  • 5

1 Answers1

2

The bind_rows() function from the dplyr library is what you need! To 'merge' three datasets into one, while respecting column names, use the command like this:

library(dplyr)
dfAll<-bind_rows(dfA, dfB, dfC)

Edit: Update, directly call all three datasets. Removed intermediate step as first posted.

M.Viking
  • 5,067
  • 4
  • 17
  • 33