1

I want to make a function to recode values within a variable to binary 0 and 1 using ifelse. Suppose I have this dataset:

df <- data.frame(
        id = 1:10,
        region = rep(c("Asia", "Africa", "Europe", "America"), length = 10)
        )

This is the output that I want:

enter image description here

However, I want to create those columns by using function, so I just have to enter the data and variable to the function. This is as far as I get:

binary <- function(data2, var, value){
        for(i in 1:nrow(data2)){        
            val <- ifelse(data2[data2[var] == value, 1, 0)
            data2 <- cbind(data2, val)
            }
        }

Does anyone know how to use the ifelse function within the for loop and function in R? Any help is much appreciated. Thank you.

3 Answers3

3

Best thing I learned here is to multiply the matrix consisting of TRUE and FALSE by 1 and you will get 1's and 0's -> Fantastic:

df %>% cbind(model.matrix(~ region + 0, .)*1)

output:

   id  region regionAfrica regionAmerica regionAsia regionEurope
1   1    Asia            0             0          1            0
2   2  Africa            1             0          0            0
3   3  Europe            0             0          0            1
4   4 America            0             1          0            0
5   5    Asia            0             0          1            0
6   6  Africa            1             0          0            0
7   7  Europe            0             0          0            1
8   8 America            0             1          0            0
9   9    Asia            0             0          1            0
10 10  Africa            1             0          0            0

OR

We could use cbind with sapply in pipe framework:

df %>% 
    mutate(region = factor(region)) %>% 
    cbind(sapply(levels(.$region), `==`, .$region)*1) 

is the same like this:

library(dplyr)
df %>% 
    mutate(region = factor(region)) %>% 
    cbind(sapply(levels(.$region), `==`, .$region)) %>% 
    mutate(across(Africa:Europe,  ~case_when(. == TRUE ~ 1,
                                             TRUE ~ 0)))

   id  region Africa America Asia Europe
1   1    Asia      0       0    1      0
2   2  Africa      1       0    0      0
3   3  Europe      0       0    0      1
4   4 America      0       1    0      0
5   5    Asia      0       0    1      0
6   6  Africa      1       0    0      0
7   7  Europe      0       0    0      1
8   8 America      0       1    0      0
9   9    Asia      0       0    1      0
10 10  Africa      1       0    0      0

OR function:

expand_factor <- function(f) {
    m <- matrix(0, length(f), nlevels(f), dimnames = list(NULL, levels(f)))
    replace(m, cbind(seq_along(f), f), 1)
}
df %>% 
    mutate(region = factor(region)) %>% 
    cbind(expand_factor(.$region)*1)

TarJae
  • 72,363
  • 6
  • 19
  • 66
3

reshape

Doing it that way seems a little inefficient; it appears to be just a pivoting/reshaping operation, so this is a one-shot deal:

df2 <- reshape2::dcast(df, id + region ~ region, value.var = "region")
df2[,unique(df2$region)] <- lapply(df2[,unique(df2$region)], function(z) +!is.na(z))
df2
#    id  region Africa America Asia Europe
# 1   1    Asia      0       0    1      0
# 2   2  Africa      1       0    0      0
# 3   3  Europe      0       0    0      1
# 4   4 America      0       1    0      0
# 5   5    Asia      0       0    1      0
# 6   6  Africa      1       0    0      0
# 7   7  Europe      0       0    0      1
# 8   8 America      0       1    0      0
# 9   9    Asia      0       0    1      0
# 10 10  Africa      1       0    0      0

The dcast pivots (while preserving the original "region" column); the intermediate value (immed after dcast) is

reshape2::dcast(df, id+region~region, value.var="region")
#    id  region Africa America Asia Europe
# 1   1    Asia   <NA>    <NA> Asia   <NA>
# 2   2  Africa Africa    <NA> <NA>   <NA>
# 3   3  Europe   <NA>    <NA> <NA> Europe
# 4   4 America   <NA> America <NA>   <NA>
# 5   5    Asia   <NA>    <NA> Asia   <NA>
# 6   6  Africa Africa    <NA> <NA>   <NA>
# 7   7  Europe   <NA>    <NA> <NA> Europe
# 8   8 America   <NA> America <NA>   <NA>
# 9   9    Asia   <NA>    <NA> Asia   <NA>
# 10 10  Africa Africa    <NA> <NA>   <NA>

so all we need to do is convert those from strings/NAs to "is or is not NA", which is done using +!is.na(z).

base R, not reshaping

uniqregion <- unique(df$region)
tmp <- +outer(df$region, unique(df$region), `==`)
colnames(tmp) <- uniqregion
tmp
#       Asia Africa Europe America
#  [1,]    1      0      0       0
#  [2,]    0      1      0       0
#  [3,]    0      0      1       0
#  [4,]    0      0      0       1
#  [5,]    1      0      0       0
#  [6,]    0      1      0       0
#  [7,]    0      0      1       0
#  [8,]    0      0      0       1
#  [9,]    1      0      0       0
# [10,]    0      1      0       0
cbind(df, tmp)
#    id  region Asia Africa Europe America
# 1   1    Asia    1      0      0       0
# 2   2  Africa    0      1      0       0
# 3   3  Europe    0      0      1       0
# 4   4 America    0      0      0       1
# 5   5    Asia    1      0      0       0
# 6   6  Africa    0      1      0       0
# 7   7  Europe    0      0      1       0
# 8   8 America    0      0      0       1
# 9   9    Asia    1      0      0       0
# 10 10  Africa    0      1      0       0

Literal function

If you really want a function to loop over it, though, I still recommend lapply over a for loop:

binary <- function(data2, variable) {
  uniq <- unique(data2[[variable]])
  cbind(data2, as.data.frame(
    lapply(setNames(nm = uniq),
           function(z) +(z == data2[[variable]]) )
  ))
}
binary(df, "region")
#    id  region Asia Africa Europe America
# 1   1    Asia    1      0      0       0
# 2   2  Africa    0      1      0       0
# 3   3  Europe    0      0      1       0
# 4   4 America    0      0      0       1
# 5   5    Asia    1      0      0       0
# 6   6  Africa    0      1      0       0
# 7   7  Europe    0      0      1       0
# 8   8 America    0      0      0       1
# 9   9    Asia    1      0      0       0
# 10 10  Africa    0      1      0       0

(You might consider not cbind(data2, here, instead just returning the Asia:America columns, allowing the calling function (user) to determine what to do with it; perhaps that's too OCD/generalizing. Just a thought.)

Literal function using for loop

But if you really must have it ...

binary2 <- function(data2, variable) {
  uniq <- unique(data2[[variable]])
  for (nm in uniq) {
    data2[[nm]] <- +(data2[[variable]] == nm)
  }
  data2
}
binary2(df, "region")
#    id  region Asia Africa Europe America
# 1   1    Asia    1      0      0       0
# 2   2  Africa    0      1      0       0
# 3   3  Europe    0      0      1       0
# 4   4 America    0      0      0       1
# 5   5    Asia    1      0      0       0
# 6   6  Africa    0      1      0       0
# 7   7  Europe    0      0      1       0
# 8   8 America    0      0      0       1
# 9   9    Asia    1      0      0       0
# 10 10  Africa    0      1      0       0
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

It's generally best to use vectorised functions in R instead of loops. For example, rather than write a custom function with loops, you could use case_when from dplyr to do the same thing:

library(tidyverse)

df %>%
  mutate(
    Asia = case_when(region == "Asia" ~ 1, TRUE ~ 0),
    Africa = case_when(region == "Africa" ~ 1, TRUE ~ 0),
    Europe = case_when(region == "Europe" ~ 1, TRUE ~ 0),
    America = case_when(region == "America" ~ 1, TRUE ~ 0)
  )

Or, simpler version (thanks to MartinGal):

df %>%
  mutate(Asia = +(region == "Asia"),
         Africa = +(region == "Africa"),
         Europe = +(region == "Europe"),
         America = +(region == "America"))
Rory S
  • 1,278
  • 5
  • 17