38

I have a data.frame:

SelectVar
     a   b  c   d   e   f   g   h   i j k l ll m n o p  q   r
1 Dxa8 Dxa8 0 Dxa8 Dxa8 0 Dxa8 Dxa8 0 0 0 0  0 0 0 0 0 Dxc8 0
2 Dxb8 Dxc8 0 Dxe8 Dxi8 0 tneg tpos 0 0 0 0  0 0 0 0 0 Dxi8 0

I would like to remove the columns with zero values in both rows from the data frame, so it yields a data frame as below:

SelectVar
     a   b    d    e    g   h     q   
1 Dxa8 Dxa8 Dxa8 Dxa8 Dxa8 Dxa8  Dxc8 
2 Dxb8 Dxc8 Dxe8 Dxi8 tneg tpos  Dxi8 

Have tried:

SelectVar!=0

which yields a True/False dataframe, and:

SelectVar[, colSums(abs(SelectVar)) ! == 0]

which yields an error.

How could I remove the columns with zero values in each row?

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
Barnaby
  • 1,472
  • 4
  • 20
  • 34

10 Answers10

64

You almost have it. Put those two together:

 SelectVar[, colSums(SelectVar != 0) > 0]

This works because the factor columns are evaluated as numerics that are >= 1.

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • 4
    No, it does not work with `NA` values. If `NA` value are present, replace the test with `!is.na(colSums(SelectVar != 0)) & colSums(SelectVar != 0) > 0` (or equivalent). – Matthew Lundberg Feb 03 '14 at 15:12
  • 6
    With `NA` just try: `SelectVar[, colSums(SelectVar != 0, na.rm = TRUE) > 0]` – mpalanco Jul 14 '15 at 09:50
  • @MatthewLundberg - Will this `SelectVar[, colSums(SelectVar == 0) == 0]` remove all columns with sum as zero? – Chetan Arvind Patil Aug 07 '17 at 21:13
  • @ChetanArvindPatil No. The sum is taken over the expression `SelectVar == 0`, a logical, and gives the number of zero values. That expression selects (not removes) all columns with no zero values. To remove all columns that sum as zero: `SelectVar[, colSums(selectVar) != 0]` – Matthew Lundberg Aug 07 '17 at 22:28
  • @MatthewLundberg - Ok. What about this `SelectVar <- SelectVar [, colSums(SelectVar == 0) != nrow(SelectVar )]`? – Chetan Arvind Patil Aug 07 '17 at 22:34
  • @ChetanArvindPatil Looks like that removes a column if it contains any `0` value. – Matthew Lundberg Aug 07 '17 at 23:00
  • 1
    @M.Mashaye That will return identical results in all cases. `abs(x) == 0` is equivalent to `x == 0`. – Matthew Lundberg Jan 23 '18 at 20:24
  • @MatthewLundberg what package does SelectVar come with, I've tried googling it but all I can find is info under the kimod package, is this correct? – TheGoat Jun 19 '18 at 11:59
  • 1
    @TheGoat That's the name of the `data.frame` in the question. – Matthew Lundberg Jun 19 '18 at 14:14
  • i'm worried that about negative values in the column, probably an absolute value is needed there; or shoot what if the col just happens to sum to exactly 0 with eq value pos and negative – tofutim Jul 27 '18 at 19:19
  • @tofutim There's not a problem there. `colSums` is taken over the logical `SelectVar != 0` so it is the count of nonzero elements. It is not the sum of the column values. – Matthew Lundberg Jul 28 '18 at 20:29
  • Hey, is it possible to the get the names of the columns which will be deleted? For example in this case, "i" "j" "k" "ll" "m" "n" "o" "p"? – user10579790 May 10 '19 at 07:51
11

A dplyr friendly solution:

SelectVar %>% select_if(colSums(.) != 0)

seapen
  • 345
  • 1
  • 4
  • 13
  • but what if there are negative values and negatives and positivea add up to 0? you would be removing those columns with this method, right? – charlesdarwin Oct 31 '20 at 02:57
  • Good catch @charlesdarwin! Most likely, yes. Though I've recently found another dplyr/tidyverse friendly solution ```purrr::discard(~all(is.na(.))) ``` – seapen Dec 14 '20 at 21:30
10

One option since dplyr 1.0.0 could be:

df %>%
 select(where(~ any(. != 0)))

     a    b    d    e    g    h    q
1 Dxa8 Dxa8 Dxa8 Dxa8 Dxa8 Dxa8 Dxc8
2 Dxb8 Dxc8 Dxe8 Dxi8 tneg tpos Dxi8
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
5

Try also

SelectVar[, !apply(SelectVar == 0, 2, all)]

This was taken from here:

Delete all columns with 0 from matrix

Community
  • 1
  • 1
Fernando
  • 7,785
  • 6
  • 49
  • 81
5

The faster option, by about 40% according to mean execution times, is

df[,-(which(colSums(df)==0))]

We can benchmark the two options with a simple example data frame consisting of 3,000 columns and two observations.

# Create simple 2 X 3000 data frame with many 1s and 0s
# 500 columns have all 0s
df = matrix(c(rep(c(0,1,1),1000),rep(c(1,0,0),1000)),nrow=2)
df = as.data.frame(df)

# Benchmark the two options in milliseconds, 100 times
library(microbenchmark)
microbenchmark(
  df[,colSums(df != 0) > 0],
  df[,-(which(colSums(df)==0))]
  )

Unit: milliseconds
                             expr     min       lq     mean   median       uq      max neval
       df[, colSums(df != 0) > 0] 23.3844 24.77905 30.24852 26.37730 29.17175 140.6486   100
 df[, -(which(colSums(df) == 0))] 17.3664 19.12815 21.58901 20.59055 22.29905  41.9485   100

Dale Kube
  • 1,400
  • 13
  • 24
1

All above answers are valid -
but for some of us, perhaps the most intuitive solution to read following the goal remove columns with zero values is:

df %>% select(where(~ sum(.) != 0))

You can read it as:
For every column (~), remove it (!=) if its sum (sum(.)) is zero

Agile Bean
  • 6,437
  • 1
  • 45
  • 53
1

simple answer that remove columns with any zeros:

columns_to_keep = (colSums(SelectVar != 0) == nrow(SelectVar))
NewTable = SelectVar[, columns_to_keep]
Community
  • 1
  • 1
0

To remove any and all columns that contain only zeros, simply pass your data frame into the following function:

remove_zero_cols <- function(df) {
  rem_vec <- NULL
  for(i in 1:ncol(df)){
    this_sum <- summary(df[,i])
    zero_test <- length(which(this_sum == 0))
    if(zero_test == 6) {
      rem_vec[i] <- names(df)[i]
    }
  }
  features_to_remove <- rem_vec[!is.na(rem_vec)]
  rem_ind <- which(names(df) %in% features_to_remove)
  df <- df[,-rem_ind]
  return(df)
}

Example:

iris$Sepal.Width <- 0
new_df <- remove_zero_cols(iris)
print(new_df)
Cybernetic
  • 12,628
  • 16
  • 93
  • 132
0

you can try something like

   [row, column] = SelectVar.shape
    for j in range(column):
       if np.all(SelectVar.iloc[:, j] == 0):
           SelectVar = SelectVar.drop(SelectVar.columns[j], axis=1, inplace=True)
Himanshu Aggarwal
  • 163
  • 1
  • 1
  • 5
0

Late answer, but one other base R approach which can work here would be to assert that either the minimum or maximum value in each column is not zero:

colMax <- sapply(SelectVar, max, na.rm=TRUE)
colMin <- sapply(SelectVar, min, na.rm=TRUE)
SelectVar[, colMin != 0 | colMax != 0]
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360