Cross tabulation with n>2 categories in R: hide rows with zero cases

Question

I'm trying to make a cross tabulation in R, and having its output resemble as much as possible what I'd get in an Excel pivot table. So, given this code:

set.seed(2)
df<-data.frame("ministry"=paste("ministry ",sample(1:3,20,replace=T)),"department"=paste("department ",sample(1:3,20,replace=T)),"program"=paste("program ",sample(letters[1:20],20,replace=F)),"budget"=runif(20)*1e6)
library(tables)
library(dplyr)
arrange(df,ministry,department,program)
tabular(ministry*department~((Count=budget)+(Avg=(mean*budget))+(Total=(sum*budget))),data=df)

which yields:

                                 Avg    Total  
 ministry    department    Count budget budget 
 ministry  1 department  1 5     479871 2399356
             department  2 1     770028  770028
             department  3 1     184673  184673
 ministry  2 department  1 2     170818  341637
             department  2 1     183373  183373
             department  3 3     415480 1246440
 ministry  3 department  1 0        NaN       0    <---- LOOK HERE
             department  2 5     680102 3400509
             department  3 2     165118  330235

How do I get the output to hide the rows with zero frequencies? I'm using tables::tabular but any other package is good for me (as long as there's a way, even indirect, of outputting to html). This is for generating HTML or Latex using R Markdown and displaying the table with my script's results as Excel would, or as in the example above in a pivot-table like form. But without the superfluous row.

Thanks!

score 0 · Answer 1 · answered May 13 '15 at 20:02

Why not just use dplyr?

df %>%
group_by(ministry, department) %>%
summarise(count = n(),
        avg_budget = mean(budget, na.rm = TRUE),
        tot_budget = sum(budget, na.rm = TRUE))



     ministry    department count avg_budget tot_budget
1 ministry  1 department  1     5   479871.1  2399355.6
2 ministry  1 department  2     1   770027.9   770027.9
3 ministry  1 department  3     1   184673.5   184673.5
4 ministry  2 department  1     2   170818.3   341636.5
5 ministry  2 department  2     1   183373.2   183373.2
6 ministry  2 department  3     3   415479.9  1246439.7
7 ministry  3 department  2     5   680101.8  3400508.8
8 ministry  3 department  3     2   165117.6   330235.3

I wasn't clear enough in the original question, I'm sorry. In my script I've already processed the data; what I need is to display the results as in the example provided, in a pivot-like format, just without the extra zeroes filled row. — s_a, May 13 '15 at 22:28

score 0 · Answer 2 · edited May 23 '17 at 11:58

While I don't understand at all how the tabular object is made (since it says it's a list but seems to behaves like a data frame), you can select cells as usual, so

> results <-tabular(ministry*department~((Count=budget)+(Avg=(mean*budget))+(Total=(sum*budget))),data=df)
> results[results[,1]!=0,]

                                 Avg    Total  
 ministry    department    Count budget budget 
 ministry  1 department  1 5     479871 2399356
             department  2 1     770028  770028
             department  3 1     184673  184673
 ministry  2 department  1 2     170818  341637
             department  2 1     183373  183373
             department  3 3     415480 1246440
 ministry  3 department  2 5     680102 3400509
             department  3 2     165118  330235

That's the solution.

I just found out the solution thanks to this user's reply on another question https://stackoverflow.com/users/516548/g-grothendieck

Cross tabulation with n>2 categories in R: hide rows with zero cases

2 Answers2