-1

Here is my df.

Airline Destination delayed ontime Total_Arrivals
Alaska Los Angelos 62 497 559
Alaska Phoenix 12 221 233
Alaska San Diego 20 212 232
Alaska San Francisco 102 503 605
Alaska Seatlle 305 1841 2146
AM West Los Angelos 117 694 811
AM West Phoenix 415 4840 5255
AM West San Diego 65 383 448
AM West San Francisco 129 320 449
AM West Seatlle 61 201 262

This is my desired dataframe

Airline delayed ontime Total_Arrivals
Alaska 501 3274 3775
AM West 787 6438 7225

Here is my code

      Overview_All_Flights_test<- summarize(Airline_Arrivals_Wide
    ,group_by(Airline)
    , Airline
    , Total_Ontime_by_Airline=sum(ontime)
    , Total_Delayed_by_Airline=sum(delayed)
    , Total_Arrivals_by_Airline= sum(Total_Arrivals))  

The group_by produces an error

Error in summarize(): ℹ In argument: group_by(Airline). ℹ In row 1. Caused by error in UseMethod(): ! no applicable method for 'group_by' applied to an object of class "character" Backtrace:

  1. dplyr::summarize(...)
  2. dplyr::group_by(Airline)

Just wondering if the group_by cannot be used in this and what is wrong with my code in order to make the desired df.

THanks!

  • You need to apply `group_by` to the data frame and not as separate element. As a single call it would be: `summarize(group_by(Airline_Arrivals_Wide, Airline), Total_Ontime_by_Airline=sum(ontime), ...)` – Dave2e Feb 25 '23 at 18:53

1 Answers1

0

the conventional way is to call the dplyr::group_by() before the dplyr::summarise() like this:

df %>%
    # build groupings 
    dplyr::group_by(col_name) %>% 
    # perfrom operation per group
    dplyr::summarise(sum(to_sum_column)) %>%
    # release groupings to prevent unwanted behaviour down stream
    dplyr::ungroup()

Though since dplyr 1.1 you are able to reduce the code into the dplyr::summarise() call:

df %>%
    # no separated dplyr::group_by() and dplyr::ungroup() needed
    dplyr::summarise(new_col = sum(to_sum_column), .by = to_group by)
DPH
  • 4,244
  • 1
  • 8
  • 18
  • Thanks. is it possible to use summarize twice with the '%>%' pipe? like 'dplyr::summarise(sum(to_sum_column)) %>%' followed by another 'dplyr::summarise(sum(to_sum_column)) %>%' – g.senorsenor Feb 25 '23 at 18:52
  • if you want to group by more then one colume use "dplyr::group_by(col_name_1, col_name_2) %>%" if you sequence groupings in distinct calls they just overwrite eachother (grouping of the last call will be the efectve one)... with the summarise it is similar: "dplyr::summarise(sum1 = sum(col1), sum2 = sum(dol2))" if you want the sum of sums you have to build the new grouping or release the current after the first dplyr::summarise call – DPH Feb 25 '23 at 18:55
  • thanks that worked. it also works with out the ungroup tho – g.senorsenor Feb 25 '23 at 19:03
  • @g.senorsenor yes it will work without the dplyr::ungroup() but this implies the groupings are kept as a property of the tibble (enhanced dplyr version of the data.frame), which means when further processing the data you can encounter unwanted behaviour as dplyr will work group oriented when groupins are present... depending on your use you might never notice it (or waste some time to figure out why further manipulations/calculation do not pan out as planned) – DPH Feb 25 '23 at 19:08