-2

Stata and R:

I have two cross-sectional datasets I'm merging. The two datasets have an equal amount of countries and only one dataset has zero missing years (year). The problem is that the missing years are simply not recorded, so I need to make a new variable that would add the years where there is no other data. Otherwise, I cannot merge the datasets according to the two keys, country and year.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Max
  • 1

1 Answers1

1

Not so -- in Stata (and I would be surprised at a problem in R, but others must speak to that).

Missing observations -- in this context and any similar better called absent -- are not a problem. Here's a demonstration. merge is smart enough to notice gaps and make them explicit as missings. You could "fix" them yourself ahead of the merge, but that is pointless.

clear
input state year y 
1  2019 1 
1  2020 2
2  2019 3
2  2020 4 
end 

save tomerge 

clear 

input state year x 
1   2019  42 
2   2019  84 
end 

merge 1:1 state year using tomerge 

list 

Results

. merge 1:1 state year using tomerge 

    Result                      Number of obs
    -----------------------------------------
    Not matched                             2
        from master                         0  (_merge==1)
        from using                          2  (_merge==2)

    Matched                                 2  (_merge==3)
    -----------------------------------------

. 
. list 

     +----------------------------------------+
     | state   year    x   y           _merge |
     |----------------------------------------|
  1. |     1   2019   42   1      Matched (3) |
  2. |     2   2019   84   3      Matched (3) |
  3. |     1   2020    .   2   Using only (2) |
  4. |     2   2020    .   4   Using only (2) |
     +----------------------------------------+

Otherwise put, 1:1 as syntax specifies the overall pattern and doesn't rule out 0:1 or 1:0 matches. merge will actually append if identifiers don't match at all. You do need the key variables to exist under identical names in both datasets.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47