2

I have a data table like below:

city         year    t_20   t_25 
Seattle      2019    82      91  
Seattle      2018     0      103   
NYC          2010    78       8 
DC           2011    71       0  
DC           2011     0       0    
DC           2018    60       0

I would like to group them by city and year and count the number of zeros in each group.

How can I do this? by summarize_at?

df %>% group_by(city, year) %>% summarise_at( WHAT GOES HERE , vars(t_20:t_25))

What should be the first argument of summarize_at?

or any other way? tally?

OverFlow Police
  • 861
  • 6
  • 23

2 Answers2

1

An option is to reshape from wide to long before summariseing

library(tidyverse)
df %>%
    gather(k, v, -city, -year) %>%
    group_by(city, year) %>%
    summarise(n_0 = sum(v == 0)) 
#    # A tibble: 5 x 3
## Groups:   city [?]
#  city     year   n_0
#  <fct>   <int> <int>
#1 DC       2011     3
#2 DC       2018     1
#3 NYC      2010     0
#4 Seattle  2018     1
#5 Seattle  2019     0

To summarise for each column separate you can do

df %>%
    group_by(city, year) %>%
    summarise_all(funs(sum(. == 0)))
## A tibble: 5 x 4
## Groups:   city [?]
#  city     year  t_20  t_25
#  <fct>   <int> <int> <int>
#1 DC       2011     1     2
#2 DC       2018     0     1
#3 NYC      2010     0     0
#4 Seattle  2018     1     0
#5 Seattle  2019     0     0

Sample data

df <- read.table(text =
    "city         year    t_20   t_25
Seattle      2019    82      91
Seattle      2018     0      103
NYC          2010    78       8
DC           2011    71       0
DC           2011     0       0
DC           2018    60       0", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thank you. But in your result, it is not clear how many zeros are in each column `t_20` and `t_25`. It says for e.g. `DC 2011 3`, but I want to know how many zeros in each column exist. – OverFlow Police Mar 01 '19 at 05:25
  • I see @NoLie; in that case I misunderstood; I thought the whole exercise was to sum across all columns. To do it for every column separately is even shorter, see my update. – Maurits Evers Mar 01 '19 at 05:28
  • 1
    Great. Thank you very much indeed – OverFlow Police Mar 01 '19 at 05:33
0

A simple group by operation lends itself well to be formulated using SQL. For those SQL inclined, we could also try to solve this problem using the sqldf library:

library(sqldf)

sql <- "SELECT city, COUNT(CASE WHEN t_20 = 0 THEN 1 END) AS t_20_cnt,
            COUNT(CASE WHEN t_25 = 0 THEN 1 END) AS t_25_cnt
        FROM df
        GROUP BY city"

output <- sqldf(sql)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360