3

I have a function checking zero numbers in each column in a large dataframe. Now I want to check zero numbers in each col after grouped by category. Here is the example:

   zero_rate <- function(df) {
     z_rate_list <- sapply(df, function(x) {
      data.frame(
      n_zero=length(which(x==0)), 
      n=length(x), 
      z_rate=length(which(x==0))/length(x))
 })

      d <- data.frame(z_rate_list)
      d <- sapply(d, unlist)
      d <- as.data.frame(d)

      return(d)}

   df = data.frame(var1=c(1,0,NA,4,NA,6,7,0,0,10),var2=c(11,NA,NA,0,NA,16,0,NA,19,NA))
   df1= data.frame(cat = c(1,1,1,1,1,2,2,2,2,2),df)


   zero_rate_df =  df1 %>% group_by(cat) %>% do( zero_rate(.))

Here zero_rate(df) works just as I expected. But when I group the data by cat and calculate in each category the zero_rate for each column, the result is not as I expected. I expect something like this:

   cat         va1  var2
    1   n_zero  1   1
            n   5   5
        z_rate  0.2 0.2
    2   n_zero  2   1
            n   5   5
       z_rate   0.4 0.2

Any suggestion? Thank you.

newleaf
  • 2,257
  • 8
  • 32
  • 52

1 Answers1

3

I came up with the following code. .[-1] was used to remove grouping col:

zero_rate <- function(df){
    res <- lapply(df, function(x){
        y <- c(sum(x == 0, na.rm = T), length(x))
        c(y, y[1]/y[2])
    })
    res <- do.call(cbind.data.frame, res)
    res$vars <- c('n_zero', 'n', 'z_rate')
    res
}

df1 %>% group_by(cat) %>% do( zero_rate(.[-1]))

#     cat  var1  var2   vars
#   <dbl> <dbl> <dbl>  <chr>
# 1     1   1.0   1.0 n_zero
# 2     1   5.0   5.0      n
# 3     1   0.2   0.2 z_rate
# 4     2   2.0   1.0 n_zero
# 5     2   5.0   5.0      n
# 6     2   0.4   0.2 z_rate
mt1022
  • 16,834
  • 5
  • 48
  • 71