R: Remove multiple empty columns of character variables

Question

I have a data frame where all the variables are of character type. Many of the columns are completely empty, i.e. only the variable headers are there, but no values. Is there any way to subset out the empty columns?

Btw, by empty, do you mean `NA` or `""`? – asb Jul 16 '13 at 09:33 — asb, Jul 16 '13 at 09:33

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-07-16T09:47:16.010

29

If your empty columns are really empty character columns, something like the following should work. It will need to be modified if your "empty" character columns include, say, spaces.

Sample data:

mydf <- data.frame(
  A = c("a", "b"),
  B = c("y", ""),
  C = c("", ""),
  D = c("", ""),
  E = c("", "z")
)
mydf
#   A B C D E
# 1 a y      
# 2 b       z

Identifying and removing the "empty" columns.

mydf[!sapply(mydf, function(x) all(x == ""))]
#   A B E
# 1 a y  
# 2 b   z

Alternatively, as recommended by @Roland:

> mydf[, colSums(mydf != "") != 0]
  A B E
1 a y  
2 b   z

edited Jul 16 '13 at 09:47

answered Jul 16 '13 at 09:25

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

6

I just recommend to extend the condition in case there are NA values instead of "": `all(x == "" || is.na(x))`. But it will loose a bit of the elegance though :) – Tomas Jul 16 '13 at 09:36
2

`mydf[,colSums(mydf!="")!=0]` might be faster. – Roland Jul 16 '13 at 09:43
@Tomas, my *guess* here was that some data got read in incorrectly, leading to empty character columns, and `NA` is different than a read error, but good suggestion anyway. – A5C1D2H2I1M1N2O1R2T1 Jul 16 '13 at 09:45
Hi Ananda... Actually, the reading-in is correct. The external data is in the form of many blocks appended together. I have to parse out the individual blocks and throw out the irrelevant columns. – user702432 Jul 16 '13 at 09:49
1

@AnandaMahto Maybe `mydf[,colSums(mydf!="")>0]` is more to your taste? – Roland Jul 16 '13 at 09:51
@user702432, Cool. Glad it worked out for you. In the future, please do try to make a *reproducible example*, as I have done in this answer. – A5C1D2H2I1M1N2O1R2T1 Jul 16 '13 at 09:52

Sam Firke · Answer 2 · 2018-08-08T19:02:32.063

14

If you're talking about columns where all values are NA, use remove_empty("cols") from the janitor package.

If you have character vectors where every value is the empty string "", you can first convert those values to NA throughout your data.frame with na_if from the dplyr package:

dat <- data.frame(
  x = c("a", "b", "c"),
  y = c("", "", ""),
  z = c(NA, NA, NA),
  stringsAsFactors = FALSE
)

dat
#>   x y  z
#> 1 a   NA
#> 2 b   NA
#> 3 c   NA

library(dplyr)
library(janitor)

dat %>%
  mutate_all(funs(na_if(., ""))) %>%
  remove_empty("cols")
#>   x
#> 1 a
#> 2 b
#> 3 c

edited Aug 08 '18 at 19:02

answered Jan 27 '17 at 13:23

Sam Firke

21,571
9
87
105

1

Neat solution! But depreciated. Package says: This function is deprecated, use `remove_empty("cols")` instead. – radek Aug 07 '18 at 06:07
@radek good catch, thanks. I updated the answer accordingly. – Sam Firke Aug 08 '18 at 19:03

asb · Answer 3 · 2013-07-16T09:35:33.140

13

You can do either of the following:

emptycols <- sapply(df, function (k) all(is.na(k)))
df <- df[!emptycols]

or:

emptycols <- colSums(is.na(df)) == nrow(df)
df <- df[!emptycols]

If by empty you mean they are "", the second approach can be adapted like so:

emptycols <- colSums(df == "") == nrow(df)

edited Jul 16 '13 at 09:35

answered Jul 16 '13 at 09:26

asb

4,392
1
20
30

This is throwing an error. I think is.na() works only on numeric variables. – user702432 Jul 16 '13 at 09:29
user702432, the error thrown is not because of what you're saying, `is.na` will work just fine on character vectors as well. It's because he's passing an anonymous function to another. @asb, I think you can't do that. you'll have to do `sapply(df, function(x) all(is.na(x)))`. – Arun Jul 16 '13 at 09:30
Oversight. Just fixed it. – asb Jul 16 '13 at 09:31

score 6 · Answer 4 · edited May 23 '17 at 12:09

6

I have a similar situation -- I'm working with a large public records database but when I whittle it down to just the date range and category that I need, there are a ton of columns that aren't in use. Some are blank and some are NA.

The selected answer: https://stackoverflow.com/a/17672737/233467 didn't work for me, but this did:

df[!sapply(df, function (x) all(is.na(x) | x == ""))]

edited May 23 '17 at 12:09

Community

1
1

answered May 12 '16 at 05:19

Amanda

12,099
17
63
91

score 4 · Answer 5 · answered Jul 16 '13 at 09:35

4

It depends what you mean by empty: Is it NA or "", or can it even be " "? Something like this might work:

df[,!apply(df, 2, function(x) all(gsub(" ", "", x)=="", na.rm=TRUE))]

answered Jul 16 '13 at 09:35

user2586682

41
1

score 4 · Answer 6 · answered Oct 11 '19 at 16:49

4

This can also be done by dplyr and select_if

`select_if(df,function(x){any(!is.na(x))})`

or with is.null() or x=="" depending on how empty values are defined in your data.

answered Oct 11 '19 at 16:49

Gauti

83
6

score 2 · Answer 7 · answered Jul 28 '18 at 05:44

2

Here is something that can be modified to exclude columns containing any variables specied.

newdf= df[, apply(df, 2, function(x) !any({is.na(x) | x== "" | 
x== "-4"} ) )]

answered Jul 28 '18 at 05:44

Eddy Zavala

549
1
4
12

score 2 · Answer 8 · answered Dec 30 '19 at 10:51

2

A simple solution using the purrr package:

purrr::discard(my_data_frame, ~all(is.na(.)))

answered Dec 30 '19 at 10:51

Lior Polat

41
4

score -1 · Answer 9 · answered Jul 16 '13 at 09:23

-1

If you know the column indices, you can use

df[,-c(3, 5, 7)]

This will omit columns 3, 5, 7.

answered Jul 16 '13 at 09:23

Tomas

57,621
49
238
373

1

Unfortunately I don't. There are 400+ variables, and the empty columns are all randomly placed. – user702432 Jul 16 '13 at 09:24

R: Remove multiple empty columns of character variables

9 Answers9

Linked

Related