3

I have some variables which take value between 1 and 5. I would like to code them 0 if they take the value between 1 and 3 (included) and 1 if they take the value 4 or 5.

My dataset looks like this

var1    var2        var3
1       1            NA
4       3            4
3       4            5
2       5            3

So I would like it to be like this:

var1    var2        var3
0       0            NA
1       0            1
0       1            1
0       1            0

I tried to do a function and to call it

making_binary <- function (var){
  var <- factor(var >= 4, labels = c(0, 1))
  return(var)
}


df <- lapply(df, making_binary)

But I had an error : incorrect labels : length 2 must be 1 or 1

Where did I go wrong? Thank you very much for your answers!

Emeline
  • 161
  • 9

3 Answers3

4

You can use :

df[] <- +(df == 4 | df == 5)
df
#  var1 var2 var3
#1    0    0   NA
#2    1    0    1
#3    0    1    1
#4    0    1    0

Comparison of df == 4 | df == 5 returns logical values (TRUE/FALSE), + here turns those logical values to integer values (1/0) respectively.

If you want to apply this for selected columns you can subset the columns by position or by name.

cols <- 1:3 #Position
#cols <- grep('var', names(df)) #Name
df[cols] <- +(df[cols] == 4 | df[cols] == 5)

As far as your function is concerned you can do :

making_binary <- function (var){
  var <- as.integer(var >= 4)
  #which is faster version of
  #var <- ifelse(var >= 4, 1, 0)
  return(var)
}

df[] <- lapply(df, making_binary)

data

df <- structure(list(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L, 
5L), var3 = c(NA, 4L, 5L, 3L)), class = "data.frame", row.names = c(NA, -4L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I cannot really do that because I have lots of other variables which I do not want to change – Emeline Jun 30 '20 at 07:43
  • 1
    Interesting. Please could you explain what does this leading `+` mean ? – Stéphane Laurent Jun 30 '20 at 07:44
  • 1
    @Emeline if you only want to change the first and second column change `df[]` to `df[, c(1:2)]` – Nico Jun 30 '20 at 07:46
  • 1
    Thank you for answering many of my questions andd always make it simple for a beginner to understand! I am really improving thanks to you (and others from Stackoverflow!) – Emeline Jun 30 '20 at 07:47
  • 1
    @Emeline There are ways in which you can apply the function to selected columns. See edit to the answer that shows couple of them. – Ronak Shah Jun 30 '20 at 07:52
1

I think ifelse would fit the problem well:

df[] <- lapply(df, function(x) ifelse(x >=1 & x <=3, 0, x))
df
  var1 var2 var3
1    0    0   NA
2    4    0    4
3    0    4    5
4    0    5    0
df[] <- lapply(df, function(x) ifelse(x >=4 & x <=5, 1, x))

df
  var1 var2 var3
1    0    0   NA
2    1    0    1
3    0    1    1
4    0    1    0

If you need to do the two steps at once, you can look at dplyr::case_when() or data.table::fcase().

Eyayaw
  • 1,033
  • 5
  • 10
1

You can simply test if the value is larger than 3, which will return TRUE and FALSE and cast this to a number:

+(x>3)
#     var1 var2 var3
#[1,]    0    0   NA
#[2,]    1    0    1
#[3,]    0    1    1
#[4,]    0    1    0

In case you want this only for some columns, you have to subset them. E.g. for column 1 and 2:

+(x[1:2]>3)
#+(x[c("var1","var2")]>3)  #Alternative
#     var1 var2
#[1,]    0    0
#[2,]    1    0
#[3,]    0    1
#[4,]    0    1

Data:

x <- data.frame(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L, 5L)
              , var3 = c(NA, 4L, 5L, 3L))
GKi
  • 37,245
  • 2
  • 26
  • 48