0
df <- data.frame(
 id = rep(letters[1:3], 9),
 m1 = ceiling(rnorm(9, 10, 3)),
 m2 = ceiling(rnorm(9, 10, 6)),
 m3 = 0
 )

head(df) 

 id m1 m2 m3
1  a 12 14  0
2  b 11  9  0
3  c 10 10  0
4  a 16  1  0
5  b  5 15  0
6  c  8  7  0

I have a data frame with metadata in the left-most columns and a raw data matrix attached to the right side. I'd like to remove columns that sum to zero on the right side of the dataframe without breaking into two seperate objects using dplyr::select_if

df %>% 
  select_if(!(grepl("m",names(.)))) %>% 
  head()

  id
1  a
2  b
3  c
4  a
5  b
6  c

When I attempt to add a second term to evaluate whether the raw data columns (indicated by "m" prefix) sum to zero, I get the following error message:

> df %>% 
+   select_if(!(grepl("m",names(.))) || sum(.) > 0)

Error in `select_if()`:
! `.p` is invalid.
✖ `.p` should have the same size as the number of variables in the tibble.
ℹ `.p` is size 1.
ℹ The tibble has 4 columns, including the grouping variables.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
In !(grepl("m", names(.))) || sum(.) > 0 :
  'length(x) = 4 > 1' in coercion to 'logical(1)'

> rlang::last_error()

<error/rlang_error>
Error in `select_if()`:
! `.p` is invalid.
✖ `.p` should have the same size as the number of variables in the tibble.
ℹ `.p` is size 1.
ℹ The tibble has 4 columns, including the grouping variables.

I greatly appreciate any assistance with this!

user438383
  • 5,716
  • 8
  • 28
  • 43
Alex Romer
  • 27
  • 7
  • 4
    Do you want `df %>% select(where(~ is.numeric(.x) && sum(.x) > 0))`. Note that `_if/_at` are all deprecated in favor of `where` – akrun Jan 17 '23 at 19:59

2 Answers2

3

As @akrun already pointed out in the comments select_if() is deprecated. We can select() all variables that don't start with "M" !starts_with("M") and which are numeric and whose sum is larger zero where(~ is.numeric(.x) && sum(.x) > 0).

Here the double & operator is important. We first check if a column is numeric and only in this case the control flow moves on the check if the sum is greater zero. Without this we will receive an error that we have provided a non-numeric variable to sum().

library(dplyr)

df %>%
  select(!starts_with("M"),
         where(~ is.numeric(.x) && sum(.x) > 0))

#>    id m1 m2
#> 1   a 12 18
#> 2   b 13 24
#> 3   c  6 12
#> 4   a 11  8
#> 5   b  9  0
#> 6   c 12  2
#> 7   a 11  9
#> 8   b 12  4
#> 9   c  4  8
#> 10  a 12 18
#> 11  b 13 24
#> 12  c  6 12
#> 13  a 11  8
#> 14  b  9  0
#> 15  c 12  2
#> 16  a 11  9
#> 17  b 12  4
#> 18  c  4  8
#> 19  a 12 18
#> 20  b 13 24
#> 21  c  6 12
#> 22  a 11  8
#> 23  b  9  0
#> 24  c 12  2
#> 25  a 11  9
#> 26  b 12  4
#> 27  c  4  8

Created on 2023-01-17 with reprex v2.0.2

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
0

I could not find an answer with select_if, however i tried an alternate approach please check , here the colum 'm3' gets dropped as its values sums up to zero

# get the list of columns with numeric data
vec <- names(select_if(df, is.numeric))
# get the list of columns which do not sum to zero
vec2 <- vec[which(apply(df[,vec], 2, sum)!=0)]
# then use that vector to select the columns
df %>% select(id, vec2)

enter image description here

jkatam
  • 2,691
  • 1
  • 4
  • 12