Using group_by in summarize

Question

Here is my df.

Airline	Destination	delayed	ontime	Total_Arrivals
Alaska	Los Angelos	62	497	559
Alaska	Phoenix	12	221	233
Alaska	San Diego	20	212	232
Alaska	San Francisco	102	503	605
Alaska	Seatlle	305	1841	2146
AM West	Los Angelos	117	694	811
AM West	Phoenix	415	4840	5255
AM West	San Diego	65	383	448
AM West	San Francisco	129	320	449
AM West	Seatlle	61	201	262

This is my desired dataframe

Airline	delayed	ontime	Total_Arrivals
Alaska	501	3274	3775
AM West	787	6438	7225

Here is my code

      Overview_All_Flights_test<- summarize(Airline_Arrivals_Wide
    ,group_by(Airline)
    , Airline
    , Total_Ontime_by_Airline=sum(ontime)
    , Total_Delayed_by_Airline=sum(delayed)
    , Total_Arrivals_by_Airline= sum(Total_Arrivals))

The group_by produces an error

Error in summarize(): ℹ In argument: group_by(Airline). ℹ In row 1. Caused by error in UseMethod(): ! no applicable method for 'group_by' applied to an object of class "character" Backtrace:

dplyr::summarize(...)
dplyr::group_by(Airline)

Just wondering if the group_by cannot be used in this and what is wrong with my code in order to make the desired df.

THanks!

You need to apply `group_by` to the data frame and not as separate element. As a single call it would be: `summarize(group_by(Airline_Arrivals_Wide, Airline), Total_Ontime_by_Airline=sum(ontime), ...)` — Dave2e, Feb 25 '23 at 18:53

DPH · Accepted Answer · 2023-02-25T19:11:39.763

0

the conventional way is to call the dplyr::group_by() before the dplyr::summarise() like this:

df %>%
    # build groupings 
    dplyr::group_by(col_name) %>% 
    # perfrom operation per group
    dplyr::summarise(sum(to_sum_column)) %>%
    # release groupings to prevent unwanted behaviour down stream
    dplyr::ungroup()

Though since dplyr 1.1 you are able to reduce the code into the dplyr::summarise() call:

df %>%
    # no separated dplyr::group_by() and dplyr::ungroup() needed
    dplyr::summarise(new_col = sum(to_sum_column), .by = to_group by)

edited Feb 25 '23 at 19:11

answered Feb 25 '23 at 18:44

DPH

4,244
1
8
18

Thanks. is it possible to use summarize twice with the '%>%' pipe? like 'dplyr::summarise(sum(to_sum_column)) %>%' followed by another 'dplyr::summarise(sum(to_sum_column)) %>%' – g.senorsenor Feb 25 '23 at 18:52
if you want to group by more then one colume use "dplyr::group_by(col_name_1, col_name_2) %>%" if you sequence groupings in distinct calls they just overwrite eachother (grouping of the last call will be the efectve one)... with the summarise it is similar: "dplyr::summarise(sum1 = sum(col1), sum2 = sum(dol2))" if you want the sum of sums you have to build the new grouping or release the current after the first dplyr::summarise call – DPH Feb 25 '23 at 18:55
thanks that worked. it also works with out the ungroup tho – g.senorsenor Feb 25 '23 at 19:03
@g.senorsenor yes it will work without the dplyr::ungroup() but this implies the groupings are kept as a property of the tibble (enhanced dplyr version of the data.frame), which means when further processing the data you can encounter unwanted behaviour as dplyr will work group oriented when groupins are present... depending on your use you might never notice it (or waste some time to figure out why further manipulations/calculation do not pan out as planned) – DPH Feb 25 '23 at 19:08

Using group_by in summarize

1 Answers1