How to sum ID within a DF by area in R?

Question

I have a dataframe of crash statistics called crashes_TA. The datafame looks like the following but on a much larger scale with each row representing a crash.

The dataframe is called crashes_TA

TA_name	TA_code	fatal_count	serious_injury_count	minor_injury_count	ID
Grey	061	2	0	1	1
Buller	062	1	1	1	2
Grey	061	1	1	1	3
Clutha	063	0	1	1	4
Clutha	063	1	1	2	5
Otago	064	1	1	0	6

I would like to summarise fatal, serious, and minor by TA_name by creating a new column called casualties. I would also like to summarise ID which represents the number of crashes per region as this value would be different to casualties as not all crashes have casualties. This new column would be called crashes

My new dataframe would then look like this:

TA_name	TA_code	fatal_count	serious_injury_count	minor_injury_count	casualties	crashes
Grey	061	3	1	2	6	2
Buller	062	1	1	1	3	1
Clutha	063	1	2	3	6	2
Otago	064	1	1	0	2	1

This is my code I have tried so far

crashes_stats_TA <- crashes_TA %>% 
  group_by(TA_code, TA_name) %>%
  summarise(across(contains("count"), ~sum(., na.rm = T)),
            across(Population, ~mean(., na.rm = T),
            across(contains("perc"), ~mean(., na.rm = T), .names = "{.col}_mean"))) %>%
  mutate(casualties = round(fatal_count + serious_injury_count + minor_injury_count), 
         crashes = round(ID = sum(ID, na.rm = T)))

However, when I do this I get this error:

Error: Problem with `mutate()` column `Crashes`.
i `Crashes = round(ID = sum(ID, na.rm = T))`.
x object 'ID' not found

you cant assign a variable inside `round`. replace `crashes = round(ID = sum(ID, na.rm = T))` with `crashes = round(sum(ID, na.rm = T))` — Eric, Oct 10 '21 at 07:22
Shouldn't `Buller` `causalities` ouptut be `3`. In your expected output it is `1` ? — TarJae, Oct 10 '21 at 08:14

score 2 · Answer 1 · answered Oct 10 '21 at 08:31

We could do it this way:

library(dplyr)

df %>% 
  group_by(TA_name, TA_code) %>%
  add_count(name="crashes") %>% 
  summarise(across(contains("count"), sum),
            causalities = sum(fatal_count, serious_injury_count, minor_injury_count),
            crashes= unique(crashes))

  TA_name TA_code fatal_count serious_injury_count minor_injury_count causalities crashes
  <chr>     <int>       <int>                <int>              <int>       <int>   <int>
1 Buller       62           1                    1                  1           3       1
2 Clutha       63           1                    2                  3           6       2
3 Grey         61           3                    1                  2           6       2
4 Otago        64           1                    1                  0           2       1

score 1 · Answer 2 · answered Oct 10 '21 at 08:08

You may use -

library(dplyr)

df %>%
  group_by(TA_name, TA_code) %>%
  summarise(across(fatal_count:minor_injury_count, sum, na.rm = TRUE),
            crashes = n(), .groups = 'drop') %>%
  mutate(casualties = rowSums(select(., fatal_count:minor_injury_count)))

#  TA_name TA_code fatal_count serious_injury_count minor_injury_count crashes casualties
#  <chr>     <int>       <int>                <int>              <int>   <int>      <dbl>
#1 Buller       62           1                    1                  1       1          3
#2 Clutha       63           1                    2                  3       2          6
#3 Grey         61           3                    1                  2       2          6
#4 Otago        64           1                    1                  0       1          2

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")

score 1 · Answer 3 · answered Oct 10 '21 at 18:15

Using base R

out <- aggregate(.~ TA_name + TA_code, df[setdiff(names(df), "ID")], sum)
out$casualties <- rowSums(out[, -(1:2)])

-output

> out
  TA_name TA_code fatal_count serious_injury_count minor_injury_count casualties
1    Grey      61           3                    1                  2          6
2  Buller      62           1                    1                  1          3
3  Clutha      63           1                    2                  3          6
4   Otago      64           1                    1                  0          2

data

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")

How to sum ID within a DF by area in R?

3 Answers3

data