3

This may seem like a duplicate question but maybe I am missing something here.

I have been trying to remove just the columns where the sum of absolute values add to zero from a data.table.

I searched and found many solutions on this site that claim to work, and in fact, when I copy/paste exact code, it does work. However, for some reason, I can not duplicate it with my data.table.

The result of almost anything I do turns my data.table into a list. I even tried to convert my data.table to data.frame to try these solutions with no luck.

from here:

SelectVar[, colSums(SelectVar != 0) > 0]

Does not work.

SelectVar[, !apply(SelectVar == 0, 2, all)]

Does not work either.

remove_zero_cols <- function(df) {
  rem_vec <- NULL
  for(i in 1:ncol(df)){
    this_sum <- summary(df[,i])
    zero_test <- length(which(this_sum == 0))
    if(zero_test == 6) {
      rem_vec[i] <- names(df)[i]
    }
  }
  features_to_remove <- rem_vec[!is.na(rem_vec)]
  rem_ind <- which(names(df) %in% features_to_remove)
  df <- df[,-rem_ind]
  return(df)
}

This function also does not work.

I checked the class of each parameter and they are all either numeric or integer types. I also checked for any NA's and found none.

Any suggestions?

www
  • 38,575
  • 12
  • 48
  • 84
DevGin
  • 443
  • 3
  • 12
  • I've been avoiding iterating manually with a for loop checking one column at a time - perhaps that is the solution I need? I was trying to do this a little more elegantly if possible. – DevGin May 17 '18 at 02:02
  • 1
    Question: If the sum of _absolute_ values is zero, then it means that every value in the column is also zero. Is this what you intended to say? – Tim Biegeleisen May 17 '18 at 02:03
  • Yes. I want to drop data.table columns that ONLY contain 0's all the way down each row. – DevGin May 17 '18 at 02:12

4 Answers4

3

Add with = FALSE to the first solution you referenced to if you are working on a data.table.

# Create example data frame
SelectVar <- read.table(text = "    a   b  c   d   e   f   g   h   i j k l ll m n o p  q   r
1 Dxa8 Dxa8 0 Dxa8 Dxa8 0 Dxa8 Dxa8 0 0 0 0  0 0 0 0 0 Dxc8 0
2 Dxb8 Dxc8 0 Dxe8 Dxi8 0 tneg tpos 0 0 0 0  0 0 0 0 0 Dxi8 0",
                        header = TRUE, stringsAsFactors = FALSE)

# Convert to a data.table
library(data.table)

setDT(SelectVar)

SelectVar[, colSums(SelectVar != 0) > 0, with = FALSE]
#       a    b    d    e    g    h    q
# 1: Dxa8 Dxa8 Dxa8 Dxa8 Dxa8 Dxa8 Dxc8
# 2: Dxb8 Dxc8 Dxe8 Dxi8 tneg tpos Dxi8
www
  • 38,575
  • 12
  • 48
  • 84
2

The OP has requested to remove just the columns where the sum of absolute values add to zero. Later on, he has clarified that he wants to drop data.table columns that ONLY contain 0's all the way down each row.

This can be achieved by using the any() function

library(data.table)

#create sample data
n_rows <- 10L
n_cols <-  5L
DT <- data.table(id = 1:n)
dat_cols <- sprintf("dat%i", seq.int(n_cols))
for (j in seq.int(n_cols)) set(DT, NULL, dat_cols[j], 0L)
set.seed(1L)
DT[sample.int(n_rows, 0.1 * n_rows), (sample.int(n_cols, 0.5 * n_cols)) := 1L]
DT
    id dat1 dat2 dat3 dat4 dat5
 1:  1    0    0    0    0    0
 2:  2    0    1    1    0    0
 3:  3    0    0    0    0    0
 4:  4    0    0    0    0    0
 5:  5    0    0    0    0    0
 6:  6    0    0    0    0    0
 7:  7    0    0    0    0    0
 8:  8    0    0    0    0    0
 9:  9    0    0    0    0    0
10: 10    0    0    0    0    0
# find columns which are all zero using any()
dat_cols <- sprintf("dat%i", seq.int(n_cols))
zero_cols <- setDT(DT)[, lapply(.SD, function(x) !any(x)), 
                .SDcols = dat_cols]
# remove columns in place
DT[, (names(which(unlist(zero_cols)))) := NULL][]
    id dat2 dat3
 1:  1    0    0
 2:  2    1    1
 3:  3    0    0
 4:  4    0    0
 5:  5    0    0
 6:  6    0    0
 7:  7    0    0
 8:  8    0    0
 9:  9    0    0
10: 10    0    0
Uwe
  • 41,420
  • 11
  • 90
  • 134
1

Here's a tidyverse solution. You could convert your data.table into a tibble and then go from there.

library(tidyverse)
df <- tibble(a = 1:5, b = -1:3, c = 0)

selection_criteria <- (colSums(abs(df)) != 0)
df[selection_criteria]
Ben G
  • 4,148
  • 2
  • 22
  • 42
1

Before:

library(tidy verse)
DT = as_tibble(list(x=c(1,0), y=c(0,0)))
DT

A tibble: 2 x 2
        x     y
      <dbl> <dbl>
  1     1     0
  2     0     0

Use:

DT1 = DT %>% select_if(any)
DT1

After:

  tibble: 2 x 1
        x
      <dbl>
  1     1
  2     0
Charlie
  • 11
  • 2