2

I have a df with a bunch of sites and a bunch of variables. I need to count the number of non-zero values for each site. I feel like I should be able to do this with summarize() and count() or tally(), but can't quite figure it out.

reprex:


df <- 
  tribble(
    ~variable,   ~site1,   ~site2,  ~site3,
    "var1",        0 ,       1,        0,
    "var2",        .5,       0,        0,
    "var3",        .1,       2,        0,
    "var4",        0,        .8,       1
  )


# does not work:
df %>%
  summarise(across(where(is.numeric), ~ count(.x>0)))

desired output:

# A tibble: 1 × 3
  site1 site2 site3
  <dbl> <dbl> <dbl>
1   2     3     1
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
Jake L
  • 987
  • 9
  • 21

3 Answers3

4

A possible solution:

library(dplyr)

df %>% 
  summarise(across(starts_with("site"), ~ sum(.x != 0)))

#> # A tibble: 1 × 3
#>   site1 site2 site3
#>   <int> <int> <int>
#> 1     2     3     1

Another possible solution, in base R:

apply(df[-1], 2, \(x) sum(x != 0))

#> site1 site2 site3 
#>     2     3     1
PaulS
  • 21,159
  • 2
  • 9
  • 26
4

In base R, you can use colSums:

colSums(df[-1] > 0)

#> site1 site2 site3 
#>     2     3     1 
Maël
  • 45,206
  • 3
  • 29
  • 67
1

Here is another tidyverse option using purrr:

library(tidyverse)

df[,-1] %>% 
  map_dbl(~sum(. != 0))

# site1 site2 site3 
#   2     3     1 

Or an option using data.table:

library(data.table)

as.data.table(df[,-1])[, lapply(.SD, function(x) sum(x!=0))]

#   site1 site2 site3
#1:     2     3     1
AndrewGB
  • 16,126
  • 5
  • 18
  • 49