group_by and count number of elements in each column in R

Question

I have a data table like below:

city         year    t_20   t_25 
Seattle      2019    82      91  
Seattle      2018     0      103   
NYC          2010    78       8 
DC           2011    71       0  
DC           2011     0       0    
DC           2018    60       0

I would like to group them by city and year and count the number of zeros in each group.

How can I do this? by summarize_at?

df %>% group_by(city, year) %>% summarise_at( WHAT GOES HERE , vars(t_20:t_25))

What should be the first argument of summarize_at?

or any other way? tally?

Maurits Evers · Accepted Answer · 2019-03-01T05:27:56.183

An option is to reshape from wide to long before summariseing

library(tidyverse)
df %>%
    gather(k, v, -city, -year) %>%
    group_by(city, year) %>%
    summarise(n_0 = sum(v == 0)) 
#    # A tibble: 5 x 3
## Groups:   city [?]
#  city     year   n_0
#  <fct>   <int> <int>
#1 DC       2011     3
#2 DC       2018     1
#3 NYC      2010     0
#4 Seattle  2018     1
#5 Seattle  2019     0

To summarise for each column separate you can do

df %>%
    group_by(city, year) %>%
    summarise_all(funs(sum(. == 0)))
## A tibble: 5 x 4
## Groups:   city [?]
#  city     year  t_20  t_25
#  <fct>   <int> <int> <int>
#1 DC       2011     1     2
#2 DC       2018     0     1
#3 NYC      2010     0     0
#4 Seattle  2018     1     0
#5 Seattle  2019     0     0

Sample data

df <- read.table(text =
    "city         year    t_20   t_25
Seattle      2019    82      91
Seattle      2018     0      103
NYC          2010    78       8
DC           2011    71       0
DC           2011     0       0
DC           2018    60       0", header = T)

Thank you. But in your result, it is not clear how many zeros are in each column `t_20` and `t_25`. It says for e.g. `DC 2011 3`, but I want to know how many zeros in each column exist. — OverFlow Police, Mar 01 '19 at 05:25
I see @NoLie; in that case I misunderstood; I thought the whole exercise was to sum across all columns. To do it for every column separately is even shorter, see my update. — Maurits Evers, Mar 01 '19 at 05:28

score 0 · Answer 2 · answered Mar 01 '19 at 05:11

0

A simple group by operation lends itself well to be formulated using SQL. For those SQL inclined, we could also try to solve this problem using the sqldf library:

library(sqldf)

sql <- "SELECT city, COUNT(CASE WHEN t_20 = 0 THEN 1 END) AS t_20_cnt,
            COUNT(CASE WHEN t_25 = 0 THEN 1 END) AS t_25_cnt
        FROM df
        GROUP BY city"

output <- sqldf(sql)

answered Mar 01 '19 at 05:11

Tim Biegeleisen

502,043
27
286
360

Could also try: `SELECT city, SUM(t_20 = 0) AS t_20_cnt, SUM(t_25 = 0) AS t_25_cnt ...` – G. Grothendieck Mar 01 '19 at 13:18
@G.Grothendieck I didn't know SQLite supported that syntax. Good to know. – Tim Biegeleisen Mar 01 '19 at 13:19

group_by and count number of elements in each column in R

2 Answers2

Sample data