I have two wide format datasets. Both share a common index column and I would like to combine both datasets into a single wide format dataset based on this common column. A sample of the datasets is provided below.
Let dataset A be:
Wherein column 1 is a list of documents and the wide columns are themes found in those documents (1 if the document mentions the theme, 0 if otherwise)
And let dataset B be:
Wherein column 1 is the same list as in dataset A, and the other columns are countries. Values are a custom code for how "important" that document was for that particular country (e.g. 5 is very important, 1 is unimportant, 0 is no participation in the document).
I would like to combine both into a single wide dataset, wherein rows are countries and columns are the themes. The values inside the cells would equal the sum of occasions when a country engaged with a theme through a document, weighted by the "importance" coding on datset B.
A finished dataset would look like this:
For instance, AFG only participated in document A/C.3/64/L.6, with importance 5, and as this document only mentioned "Access to information", AFG's engagement with the topic is 5. AND, in turn, participated in all documents and so gained an engagement with topics "Abduction" (1*5=5), "Abortion" (1*1=1), "Academic degree" (1*1 + 1*2=3) and "Access to information" (1*4=4).
The problem is that the full datasets A and B have over 1k themes and 190 countries, respectively. So I need to find an automated way to do this merger. I would be grateful for advice on how to do this either on Excel or on R.
Many thanks