R dplyr: Drop multiple columns

Question

I have a dataframe and list of columns in that dataframe that I'd like to drop. Let's use the iris dataset as an example. I'd like to drop Sepal.Length and Sepal.Width and use only the remaining columns. How do I do this using select or select_ from the dplyr package?

Here's what I've tried so far:

drop.cols <- c('Sepal.Length', 'Sepal.Width')
iris %>% select(-drop.cols)

Error in -drop.cols : invalid argument to unary operator

iris %>% select_(.dots = -drop.cols)

Error in -drop.cols : invalid argument to unary operator

iris %>% select(!drop.cols)

Error in !drop.cols : invalid argument type

iris %>% select_(.dots = !drop.cols)

Error in !drop.cols : invalid argument type

I feel like I'm missing something obvious because these seems like a pretty useful operation that should already exist. On Github, someone posted a similar issue, and Hadley said to use 'negative indexing'. That's what (I think) I've tried, but to no avail. Any suggestions?

score 145 · Accepted Answer · answered Mar 07 '16 at 08:59

145

Check the help on select_vars. That gives you some extra ideas on how to work with this.

In your case:

iris %>% select(-one_of(drop.cols))

answered Mar 07 '16 at 08:59

phiver

23,048
14
44
56

Thanks. For some reason, this works on `iris`, but not on my actual dataframe (`iris` was a toy example). My dataframe contains 4558 rows and 147 columns. The error message I received was `Error in eval(x$expr, data, x$env) : variable names are limited to 10000 bytes`. Any idea why this might be happening? – Navaneethan Santhanam Mar 07 '16 at 10:20
1

Ah, looks like I was making a mistake. I accidentally used `select_vars` instead of `select`. Now it works perfectly! – Navaneethan Santhanam Mar 07 '16 at 11:04
7

Where are we supposed to find out about inbuilt functions like `one_of`? Unless I'm missing something it doesn't appear in the package documentation (`help(package='dplyr')`). – geotheory Jul 05 '16 at 22:48
4

@geotheory, actually one_of is documented. see `help(one_of, package = "dplyr")`. At least it is in package version 0.5.0. But it helps to read the [blogs](https://blog.rstudio.org) that Hadley posts when there are updates to one of his packages. And some functions are documented inside other functions. Unfortunately that requires reading all the documentation, which I mostly do when I want something that isn't immediately obvious or possible at with the function. – phiver Jul 06 '16 at 06:44
13

Thanks. How do you find out about these functions in the first place, in terms of documentation? – geotheory Jul 06 '16 at 14:03

score 95 · Answer 2 · edited May 25 '17 at 19:33

95

also try

## Notice the lack of quotes
iris %>% select (-c(Sepal.Length, Sepal.Width))

edited May 25 '17 at 19:33

Ricardo Saporta

54,400
17
144
178

answered Mar 20 '17 at 19:58

Miguel Rayon Gonzalez

1,513
1
11
13

7

Great! Really useful when we have to drop columns by copy-pasting the names from the console. – Pablo Casas Jun 16 '17 at 19:26

sbha · Answer 3 · 2018-04-20T17:32:26.040

56

Beyond select(-one_of(drop.cols)) there are a couple other options for dropping columns using select() that do not involve defining all the specific column names (using the dplyr starwars sample data for some more variety in column names):

starwars %>% 
  select(-(name:mass)) %>%        # the range of columns from 'name' to 'mass'
  select(-contains('color')) %>%  # any column name that contains 'color'
  select(-starts_with('bi')) %>%  # any column name that starts with 'bi'
  select(-ends_with('er')) %>%    # any column name that ends with 'er'
  select(-matches('^f.+s$')) %>%  # any column name matching the regex pattern
  select_if(~!is.list(.)) %>%     # not by column name but by data type
  head(2)

# A tibble: 2 x 2
homeworld species
  <chr>     <chr>  
1 Tatooine  Human  
2 Tatooine  Droid

edited Apr 20 '18 at 17:32

answered Mar 31 '18 at 16:12

sbha

9,802
2
74
62

Is `select_if(~!is.list(.))` equivalent to `select_if(is.list(.))`? – Jasha Nov 18 '18 at 21:00
3

In this case `~` is purrr shorthand for defining an anonamous function, it isn't another symbol for not. For example these two mean the same thing `function(x) {!is.list(x)}` and `~!is.list(.)`. think of `~` as shorthand for `function(.)`. – SlyFox Jan 29 '19 at 14:52

score 12 · Answer 4 · answered Jun 08 '17 at 05:12

12

Be careful with the select() function, because it's used both in the dplyr and MASS packages, so if MASS is loaded, select() may not work properly. To find out what packages are loaded, type sessionInfo() and look for it in the "other attached packages:" section. If it is loaded, type detach( "package:MASS", unload = TRUE ), and your select() function should work again.

answered Jun 08 '17 at 05:12

Durand Sinclair

159
1
6

12

alternatively you could access the function directly in package namespace as so ```dplyr::select()```. – Triamus Aug 17 '17 at 07:02
2

I've run into this problem too often. Now I usually define a new function at the top of my script `dselect <- dplyr::select()`. – filups21 Sep 04 '19 at 21:51
packages that are loaded later takes precedence. I always ```p_load(tidyverse)``` after all packages are loaded, to ensure functions are not masked by another package unintentionally. – taiyodayo Aug 04 '21 at 08:44

score 6 · Answer 5 · answered Mar 07 '16 at 08:59

6

We can try

iris %>% 
      select_(.dots= setdiff(names(.),drop.cols))

answered Mar 07 '16 at 08:59

akrun

874,273
37
540
662

Thanks @akrun, this worked perfectly. However, given the hype surrounding `dplyr`'s ability to make basic analysis tasks easy to read and write, I'm disappointed that the actual solution looks like a workaround. – Navaneethan Santhanam Mar 07 '16 at 10:23
@NavaneethanSanthanam Actually, the `one_of` in the other solution is the way to go. I forgot about it. – akrun Mar 07 '16 at 11:41

stevec · Answer 6 · 2021-07-15T05:34:19.360

For anyone arriving here wanting to drop a range of columns.

Minimal reproducible example

Drop a range of columns like so:

iris %>% 
  select(-(Sepal.Width:Petal.Width)) %>% 
  head

#   Sepal.Length Species
# 1          5.1  setosa
# 2          4.9  setosa
# 3          4.7  setosa
# 4          4.6  setosa
# 5          5.0  setosa
# 6          5.4  setosa

Note:

The (, ) around the column names is important and must be used

score 4 · Answer 7 · answered Feb 20 '19 at 20:32

4

Another way is to mutate the undesired columns to NULL, this avoids the embedded parentheses :

head(iris,2) %>% mutate_at(drop.cols, ~NULL)
#   Petal.Length Petal.Width Species
# 1          1.4         0.2  setosa
# 2          1.4         0.2  setosa

answered Feb 20 '19 at 20:32

moodymudskipper

46,417
11
121
167

This also doesn't give a warning if a column is not there. – skoz Aug 09 '19 at 05:35

score 3 · Answer 8 · edited Feb 28 '19 at 15:24

3

If you have a special character in the column names, either select or select_may not work as expected. This property of dplyr of using ".". To refer to the data set in the question, the following line can be used to solve this problem:

drop.cols <- c('Sepal.Length', 'Sepal.Width')
  iris %>% .[,setdiff(names(.),drop.cols)]

edited Feb 28 '19 at 15:24

JelenaČuklina

3,574
2
22
35

answered May 22 '18 at 10:26

dineshram mattapalli

31
3

Code only answers are discouraged. Please provide some explanation as to how the answer works and how it differs from the already present answers. – Ralf Stubner May 22 '18 at 14:52
Thank you!! None of the other solutions above worked for this exact reason. – Marty999 Jul 09 '19 at 17:28

score 2 · Answer 9 · answered Oct 04 '19 at 13:27

2

You can try

iris %>% select(-!!drop.cols)

answered Oct 04 '19 at 13:27

Lefty

368
4
11

score 1 · Answer 10 · edited May 11 '21 at 23:43

1

I also faced the same issue, but the main error was in including library which has another function definition with the same name as "select()". For me it was clashing with the MASS package select function.

After detaching the MASS library, the error stopped.

edited May 11 '21 at 23:43

Dharman

30,962
25
85
135

answered May 11 '21 at 23:37

Deep Kiran Lokhande

11
1

Note that you can also just specify `select` from the `dplyr` library by doing `dplyr::select` – Parseltongue May 11 '21 at 23:46

R dplyr: Drop multiple columns

10 Answers10

Minimal reproducible example

Linked