1

I have a dataframe with information of some countries and states like this:

data.frame("state1"= c(NA,NA,"Beijing","Beijing","Schleswig-Holstein","Moskva",NA,"Moskva",NA,"Berlin"), 
               "country1"=c("Spain","Spain","China","China","Germany","Russia","Germany","Russia","Germany","Germany"),
"state2"= c(NA,NA,"Beijing",NA,NA,NA,"Moskva",NA,NA,NA), 
"country2"=c("Germany","Germany","China","Germany","","Ukraine","Russia","Germany","Ukraine",""  ),
"state3"= c(NA,NA,NA,NA,"Schleswig-Holstein",NA,NA,NA,NA,"Berlin"), 
               "country3"=c("Spain","Spain","Germany","Germany","Germany","Germany","Germany","Germany","Germany","Germany"))

Now, I would like to create a new column with the information of German states. (the result would look like below). When at least one of the three variables state are a German state, assign it in the new variable.

data.frame("GE_State"=c(NA,NA,NA,NA, "Schleswig-Holstein",NA,NA,NA,NA,"Berlin"))

Please help a beginner for the condition setting. Thank you in advance!

sylvia
  • 197
  • 1
  • 8

2 Answers2

1

Using dplyr::mutate() with case_when() works, although I suspect there should be a more efficient way using across()


library(dplyr)

  df %>% 
  mutate(GE_state = case_when(country1 == "Germany" & !is.na(state1) ~ state1,
                              country2 == "Germany" & !is.na(state2) ~ state2,
                              country3 == "Germany" & !is.na(state3) ~ state3,
                              TRUE ~ NA_character_))


#>                state1 country1  state2 country2             state3 country3
#> 1                <NA>    Spain    <NA>  Germany               <NA>    Spain
#> 2                <NA>    Spain    <NA>  Germany               <NA>    Spain
#> 3             Beijing    China Beijing    China               <NA>  Germany
#> 4             Beijing    China    <NA>  Germany               <NA>  Germany
#> 5  Schleswig-Holstein  Germany    <NA>          Schleswig-Holstein  Germany
#> 6              Moskva   Russia    <NA>  Ukraine               <NA>  Germany
#> 7                <NA>  Germany  Moskva   Russia               <NA>  Germany
#> 8              Moskva   Russia    <NA>  Germany               <NA>  Germany
#> 9                <NA>  Germany    <NA>  Ukraine               <NA>  Germany
#> 10             Berlin  Germany    <NA>                      Berlin  Germany
#>              GE_state
#> 1                <NA>
#> 2                <NA>
#> 3                <NA>
#> 4                <NA>
#> 5  Schleswig-Holstein
#> 6                <NA>
#> 7                <NA>
#> 8                <NA>
#> 9                <NA>
#> 10             Berlin

Created on 2021-03-31 by the reprex package (v1.0.0)

Peter
  • 11,500
  • 5
  • 21
  • 31
0

I think you want cbind() here:

df1 <- cbind(df1, df2)

Data:

df1 <- <your first data frame>
df2 <- data.frame("GE_State"=c(NA,NA,NA,NA, "Schleswig-Holstein",NA,NA,NA,NA,"Berlin"))
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360