5

I want to compare three variables. If all have the same result (eg 0, 0, 0, and 2, 2, 2) returns a value (eg 'match').

I try this:


df_1 <- data.frame(
  x = c(0, 1, 0, 2, 0), 
  y = c(0, 2, 1, 2, 1), 
  z = c(0, 2, 1, 2, 1)
)


ifelse(df_1$x == df_1$y == df_1$z,  'match', 'not')

Error: unexpected '==' in "ifelse(df_1$x == df_1$y =="

But it doesn't work. Thanks.

neves
  • 796
  • 2
  • 10
  • 36

7 Answers7

6

You need an & in there, so df_1$x == df_1$y & df_1$y == df_1$z, i.e. x equals y AND y equals x. You also don't need ifelse for this kind of comparison. Just do the comparison and add the output to your data frame:

df_1$match <- df_1$x == df_1$y & df_1$y == df_1$z

#### OUTPUT ####
  x y z match
1 0 0 0  TRUE
2 1 2 2 FALSE
3 0 1 1 FALSE
4 2 2 2  TRUE
5 0 1 1 FALSE

However, if you really want "matched" an "not" you can do that too:

df_1$match <- ifelse(df_1$x == df_1$y & df_1$y == df_1$z, "matched", "not")

#### OUTPUT ####

  x y z match
1 0 0 0 match
2 1 2 2   not
3 0 1 1   not
4 2 2 2 match
5 0 1 1   not

Edit based on comment:

For an arbitrary number of variables you could try something like this, which checks that unique only returns one value, i.e. all are equal:

df_1$match <- apply(df_1, 1, function(r) length(unique(r)) == 1)
  • Is there an alternative to this with loop (eg `lapply` with `ifelse`)? – neves Nov 01 '19 at 05:44
  • Example: `lapply(X = df_1, FUN = function(x) { ifelse(x == x, 'match', 'not') })` – neves Nov 01 '19 at 05:47
  • 1
    @neves yes probably, but it's not necessarily a better alternative. The above utilizes vectorization, so it's simpler and likely faster than using one of the `*apply`s. –  Nov 01 '19 at 05:47
  • What would it look like with `*apply` functions? Suppose I have a dataset with 10 variables to compare. – neves Nov 01 '19 at 05:52
  • Yes. Because comparing many variables is very laborious in this way. – neves Nov 01 '19 at 05:56
  • @neves I've added an update using `apply` that deals with any number of variables. –  Nov 01 '19 at 06:15
2

If you have a large number of variables you can do:

df_1$match <- c("match", "no match")[apply(df_1, 1, function(x) length(unique(x)) != 1) + 1]
df_1

  x y z    match
1 0 0 0    match
2 1 2 2 no match
3 0 3 1 no match
4 2 2 2    match
5 0 1 1 no match
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
2

This post gives various ways to test whether all elements of a vector are the same. Since a data frame is a list of vectors, you can choose one of these methods and apply it to your data frame with one of the *apply(), purrr, or a loop.

Here is one option with purrr:

library(purrr)

df_1$comparison <- map_chr(as.data.frame(t(df_1)), ~ ifelse(
  length(unique(.x)) == 1, 'match', 'not'))

Output:

  x y z comparison
1 0 0 0      match
2 1 2 2        not
3 0 1 1        not
4 2 2 2      match
5 0 1 1        not
prosoitos
  • 6,679
  • 5
  • 27
  • 41
2

You could potentially also use rowSums():

rowSums(df_1[, -1] == df_1[, 1]) == length(df_1[, -1])

[1]  TRUE FALSE FALSE  TRUE FALSE

It checks whether the columns from the second on are the same as the first column. If all of them are them same, it returns a TRUE value.

And if you need a match/not result:

ifelse(rowSums(df_1[, -1] == df_1[, 1]) == length(df_1[, -1]), "match", "not")
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

You may try ifelse with apply, and use unique to see if matched:

df$match <- apply(df, 1, function(x) ifelse(length(unique(x))==1, 'match','not'))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

Similar to @tmfmnk answer (updated according to @Cole's comment):

ifelse(rowMeans(df_1 == df_1[, 1]) == 1, 'match', 'not')
#[1] "match" "not"   "not"   "match" "not" 
utubun
  • 4,400
  • 1
  • 14
  • 17
1

Here's an approach with Reduce()

n_cols <- length(df_1)

Reduce(`&`,
       lapply(seq_len(n_cols - 1),
              function(j) df_1[[j]] == df_1[[j+1]])
       )

Here is the performance of some of the answers evaluating to TRUE or FALSE:

# A tibble: 4 x 13
  expression                                                 min  median
  <bch:expr>                                             <bch:t> <bch:t>
1 Reduce_way                                              47.7us  50.5us
2 rowSums(df_1[, -1] == df_1[, 1]) == length(df_1[, -1]) 159.6us 168.6us
3 apply(df_1, 1, function(x) length(unique(x)) == 1)     150.6us 158.1us
4 df_1[[1]] == df_1[[2]] & df_1[[2]] == df_1[[3]]         27.5us  29.6us

The performance depends on the amount of columns and rows being evaluated. For instance 100,000 x 3:

df_1 <- as.data.frame(replicate(3, sample(3, 100000, replace = T)))

  expression                                                  min  median
  <bch:expr>                                             <bch:tm> <bch:t>
1 Reduce_way                                              931.5us  1.13ms
2 rowSums(df_1[, -1] == df_1[, 1]) == length(df_1[, -1])  10.96ms 12.69ms
3 apply(df_1, 1, function(x) length(unique(x)) == 1)        1.01s   1.01s
4 df_1[[1]] == df_1[[2]] & df_1[[2]] == df_1[[3]]         894.8us  1.06ms

# following is used from here on out instead of writing out df_1[[1]] == ...

n_cols <- length(df_1)
eval_parse <- paste(
  apply(matrix(rep(seq_len(n_cols), c(1, rep(2, n_cols - 2), 1)), 2),
        2, 
        function(cols) paste0("df_1[[", cols, "]]", collapse = ' == ')
  ),
  collapse = ' & '
)

## for 100 x 1000 data.frame

df_1 <- as.data.frame(replicate(1000, sample(3, 100, replace = T)))

# A tibble: 4 x 13
  expression                                                min median `itr/sec`
  <bch:expr>                                             <bch:> <bch:>     <dbl>
1 Reduce_way                                             15.9ms 16.3ms      60.9
2 rowSums(df_1[, -1] == df_1[, 1]) == length(df_1[, -1]) 16.5ms 17.1ms      58.1
3 apply(df_1, 1, function(x) length(unique(x)) == 1)     10.4ms 10.7ms      92.4
4 eval(parse(text = eval_parse))                         20.1ms 20.6ms      47.4
Cole
  • 11,130
  • 1
  • 9
  • 24