6

Is it possible to filter in dplyr by the position of a column?

I know how to do it without dplyr

iris[iris[,1]>6,]

But how can I do it in dplyr?

Thanks!

lokheart
  • 23,743
  • 39
  • 98
  • 169

3 Answers3

14

Besides the suggestion by @thelatemail, you can also use filter_at and pass the column number to vars parameter:

iris %>% filter_at(1, all_vars(. > 6))

all(iris %>% filter_at(1, all_vars(. > 6)) == iris[iris[,1] > 6, ])
# [1] TRUE
Psidom
  • 209,562
  • 33
  • 339
  • 356
9

No magic, just use the item column number as per above, rather than the variable (column) name:

library("dplyr")

iris %>%
  filter(iris[,1] > 6)

Which as @eipi10 commented is better as

iris %>%
  filter(.[[1]] > 6)
Scransom
  • 3,175
  • 3
  • 31
  • 51
  • 4
    Probably should be `filter(.[,1] > 6)`. It doesn't matter here, but in general if you've changed the initial data frame with other piped functions before the filter, `filter(iris[,1] > 6)` will reach outside the pipe to the original data frame, rather than use the piped data frame. – eipi10 Sep 25 '17 at 03:41
  • 2
    Just as an example where these two are not comparable - `iris %>% mutate(Sepal.Length=0) %>% filter(iris[,1] > 6)` vs `iris %>% mutate(Sepal.Length=0) %>% filter(.[,1] > 6)` – thelatemail Sep 25 '17 at 03:45
6

dply >= 1.0.0

Scoped verbs (_if, _at, _all) and by extension all_vars() and any_vars() have been superseded by across(). In the case of filter the functions if_any and if_all have been created to combine logic across multiple columns to aid in subsetting (these verbs are available in dplyr >= 1.0.4):

if_any() and if_all() are used with to apply the same predicate function to a selection of columns and combine the results into a single logical vector.

The first argument to across, if_any, and if_any is still tidy-select syntax for column selection, which includes selection by column position.

Single Column

In your single column case you could do any with the same result:

iris %>% 
  filter(across(1, ~ . > 6))

iris %>% 
  filter(if_any(1, ~ . > 6))

iris %>% 
  filter(if_all(1, ~ . > 6))

Multiple Columns

If you're apply a predicate function or formula across multiple columns then across might give unexpected results and in this case you should use if_any and if_all:

iris %>% 
  filter(if_all(c(2, 4), ~ . > 2.3)) # by column position

  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1          6.3         3.3          6.0         2.5 virginica
2          7.2         3.6          6.1         2.5 virginica
3          5.8         2.8          5.1         2.4 virginica
4          6.3         3.4          5.6         2.4 virginica
5          6.7         3.1          5.6         2.4 virginica
6          6.7         3.3          5.7         2.5 virginica

Notice this returns rows where all selected columns have a value greater than 2.3, which is a subset of rows where any of the selected columns meet the logic:

iris %>% 
  filter(if_any(ends_with("Width"), ~ . > 2.3)) # same columns selection as above

Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1           5.1         3.5          1.4         0.2    setosa
2           4.9         3.0          1.4         0.2    setosa
3           4.7         3.2          1.3         0.2    setosa
4           4.6         3.1          1.5         0.2    setosa
5           5.0         3.6          1.4         0.2    setosa
6           6.7         3.3          5.7         2.5 virginica
7           6.7         3.0          5.2         2.3 virginica
8           6.3         2.5          5.0         1.9 virginica
9           6.5         3.0          5.2         2.0 virginica
10          6.2         3.4          5.4         2.3 virginica
11          5.9         3.0          5.1         1.8 virginica

The output above was shorted to be more compact for this example.

LMc
  • 12,577
  • 3
  • 31
  • 43