I have two datasets, and what I want may be loosely termed as "outer join on a non-key variable".
Here are the datasets
Dataset 1
oc oc2 state_id r_state
A011 A01 1808 1.00
A011 A01 1810 0.50
A012 A01 1810 0.50
A011 A01 1814 0.33
A012 A01 1814 0.33
A013 A01 1814 0.33
Dataset 2
oc r_country
A011 0.62
A012 0.14
A013 0.24
what I want as output is following:
oc oc2 state_id r_state r_country
A011 A01 1808 1.00 0.62
A012 A01 1808 NA 0.14
A013 A01 1808 NA 0.24
A011 A01 1810 0.50 0.62
A012 A01 1810 0.50 0.14
A013 A01 1810 NA 0.24
A011 A01 1814 0.33 0.62
A012 A01 1814 0.33 0.62
A013 A01 1814 0.33 0.24
Note how the row numbers 2, 3 and 6 has been introduced "artificially". My question is how to do this in R. If I merge by oc
, merge
function will not create these rows. If I merge by oc2
, it will create unnecessary extra rows. Note that oc2
is just a higher level coding of oc
. Below are few lines of code to get the above datasets in data.tables. Note that these are sample datasets and actual data contain around 50 different oc2
and each oc2
can have 1 to 9 oc
in it. Also, there are 47 different states.
DT1 = data.table(oc = c('A011','A011','A012','A011','A012','A013'),
oc2 = rep('A01',6),
state_id = c(1808,1810,1810,1814,1814,1814),
r_state = c(1, 0.5,0.5,0.33,0.33,0.33)
)
DT2 = data.table(oc = c('A011','A012','A013'),
r_country = c(0.62, 0.14, 0.24)
)
Though I am using data.table
, base data.frame
solution will also do.