1

Let's say I have this dataframe:

ID   X1           X2
1    1            2
2    2            1
3    3            1
4    4            1
5    5            5
6    6            20
7    7            20
8    9            20
9    10           20


dataset <- structure(list(ID = 1:9, X1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L, 
                                           10L), X2 = c(2L, 1L, 1L, 1L, 5L, 20L, 20L, 20L, 20L)),
                     class = "data.frame", row.names = c(NA, 
                                                         -9L))

And I want to select rows in which the absolute value of the subtraction of rows are more or equal to 2 (based on columns X1 and X2).

For example, row 4 value is 4-1, which is 3 and should be selected.

Row 9 value is 10-20, which is -10. Absolute value is 10 and should be selected.

In this case it would be rows 3, 4, 6, 7, 8 and 9

I tried:

dataset2 = dataset[,abs(dataset- c(dataset[,2])) > 2] 

But I get an error.

The operation:

abs(dataset- c(dataset[,2])) > 2

Does give me rows that the sum are more than 2, but the result only works for my second column and does not select properly

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Gabriel G.
  • 555
  • 1
  • 3
  • 13
  • Could you clarify what the mathematical operation is that the code should perform and which absolute value you are referring to? – tester Nov 04 '20 at 23:29

1 Answers1

2

We can get the difference between the 'X1', 'X2' columns, create a logical expression in subset to subset the rows

subset(dataset, abs(X1 - X2) >= 2)
#  ID X1 X2
#3  3  3  1
#4  4  4  1
#6  6  6 20
#7  7  7 20
#8  8  9 20
#9  9 10 20

Or using index

subset(dataset, abs(dataset[[2]] - dataset[[3]]) >= 2)

data

dataset <- structure(list(ID = 1:9, X1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L, 
10L), X2 = c(2L, 1L, 1L, 1L, 5L, 20L, 20L, 20L, 20L)),
class = "data.frame", row.names = c(NA, 
-9L))
akrun
  • 874,273
  • 37
  • 540
  • 662