1

I have a dataframe of crash statistics called crashes_TA. The datafame looks like the following but on a much larger scale with each row representing a crash.

The dataframe is called crashes_TA

TA_name TA_code fatal_count serious_injury_count minor_injury_count ID
Grey 061 2 0 1 1
Buller 062 1 1 1 2
Grey 061 1 1 1 3
Clutha 063 0 1 1 4
Clutha 063 1 1 2 5
Otago 064 1 1 0 6

I would like to summarise fatal, serious, and minor by TA_name by creating a new column called casualties. I would also like to summarise ID which represents the number of crashes per region as this value would be different to casualties as not all crashes have casualties. This new column would be called crashes

My new dataframe would then look like this:

TA_name TA_code fatal_count serious_injury_count minor_injury_count casualties crashes
Grey 061 3 1 2 6 2
Buller 062 1 1 1 3 1
Clutha 063 1 2 3 6 2
Otago 064 1 1 0 2 1

This is my code I have tried so far

crashes_stats_TA <- crashes_TA %>% 
  group_by(TA_code, TA_name) %>%
  summarise(across(contains("count"), ~sum(., na.rm = T)),
            across(Population, ~mean(., na.rm = T),
            across(contains("perc"), ~mean(., na.rm = T), .names = "{.col}_mean"))) %>%
  mutate(casualties = round(fatal_count + serious_injury_count + minor_injury_count), 
         crashes = round(ID = sum(ID, na.rm = T)))

However, when I do this I get this error:

Error: Problem with `mutate()` column `Crashes`.
i `Crashes = round(ID = sum(ID, na.rm = T))`.
x object 'ID' not found

dataframe

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • you cant assign a variable inside `round`. replace `crashes = round(ID = sum(ID, na.rm = T))` with `crashes = round(sum(ID, na.rm = T))` – Eric Oct 10 '21 at 07:22
  • Shouldn't `Buller` `causalities` ouptut be `3`. In your expected output it is `1` ? – TarJae Oct 10 '21 at 08:14

3 Answers3

2

We could do it this way:

library(dplyr)

df %>% 
  group_by(TA_name, TA_code) %>%
  add_count(name="crashes") %>% 
  summarise(across(contains("count"), sum),
            causalities = sum(fatal_count, serious_injury_count, minor_injury_count),
            crashes= unique(crashes))
  TA_name TA_code fatal_count serious_injury_count minor_injury_count causalities crashes
  <chr>     <int>       <int>                <int>              <int>       <int>   <int>
1 Buller       62           1                    1                  1           3       1
2 Clutha       63           1                    2                  3           6       2
3 Grey         61           3                    1                  2           6       2
4 Otago        64           1                    1                  0           2       1
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

You may use -

library(dplyr)

df %>%
  group_by(TA_name, TA_code) %>%
  summarise(across(fatal_count:minor_injury_count, sum, na.rm = TRUE),
            crashes = n(), .groups = 'drop') %>%
  mutate(casualties = rowSums(select(., fatal_count:minor_injury_count)))

#  TA_name TA_code fatal_count serious_injury_count minor_injury_count crashes casualties
#  <chr>     <int>       <int>                <int>              <int>   <int>      <dbl>
#1 Buller       62           1                    1                  1       1          3
#2 Clutha       63           1                    2                  3       2          6
#3 Grey         61           3                    1                  2       2          6
#4 Otago        64           1                    1                  0       1          2

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Using base R

out <- aggregate(.~ TA_name + TA_code, df[setdiff(names(df), "ID")], sum)
out$casualties <- rowSums(out[, -(1:2)])

-output

> out
  TA_name TA_code fatal_count serious_injury_count minor_injury_count casualties
1    Grey      61           3                    1                  2          6
2  Buller      62           1                    1                  1          3
3  Clutha      63           1                    2                  3          6
4   Otago      64           1                    1                  0          2

data

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662