0

I have the following (sample) dput() for a dataframe consisting of non-unique IDs, countries, and columns representing country names:

structure(list(id = c(1, 1, 2, 3, 4, 4), country = c("USA", "Japan", 
"Germany", "Austria", "Japan", "Austria"), USA = c(0, 0, 0, 0, 
0, 0), Germany = c(0, 0, 0, 0, 0, 0), Japan = c(0, 0, 0, 0, 0, 
0), Austria = c(0, 0, 0, 0, 0, 0), Belgium = c(0, 0, 0, 0, 0, 
0)), class = "data.frame", row.names = c(NA, -6L)) 

I would like to alter the above dataframe such that, for every id i, each column with a country name "I.e., df$Germany, df$Japan" has a value == 1 if df$country == 1. I.e., rows 1 and 2 should have df$Japan == 1 and df$USA == 1; rows 5 and 6 should have df$Japan == 1 and df$Austria == 1.

Are there any ways to do this without using for loops or pivot? (notice my current data has a "Belgium" column but no Belgium value in df$country -- pivot will not work here as far as I understand).

Someone on a different thread suggested: df[levels(factor(df$country))] = model.matrix(~country - 1, df) but this gives an erroneous structure, the dput being given below:

structure(list(id = c(1, 1, 2, 3, 4, 4), country = c("USA", "Japan", 
"Germany", "Austria", "Japan", "Austria"), USA = c(1, 0, 0, 0, 
0, 0), Germany = c(0, 0, 1, 0, 0, 0), Japan = c(0, 1, 0, 0, 1, 
0), Austria = c(0, 0, 0, 1, 0, 1), Belgium = c(0, 0, 0, 0, 0, 
0)), row.names = c(NA, -6L), class = "data.frame") 

I've also thought about using mutate() with cur_column() but I'm not sure as to how to even write that code.

ZZ Top
  • 93
  • 5
  • That is my question, yes--none of those solutions solved my problem. I need to avoid for loops because of many observations (100k+ per dataset, and I have 30+ datasets) and pivot because the datasets don't list every country (i.e., in my above sample, there is no `df$country == 'Belgium'` – ZZ Top Jul 08 '22 at 14:51
  • I have added a solution to your previous question here – TarJae Jul 08 '22 at 15:14

0 Answers0