1

I need to delete all the rows with a max value of duty_perd_id where the rotn_prng_nbr and empl_nbr are the same (not the same to each other, but the max where of all of the rows where those two remain constant). From the table below it should delete rows 3,7 and 9.

rotn_prng_nbr empl_nbr duty_perd_id
B93 12 1
B93 12 2
B93 12 3
B21 12 1
B21 12 2
B21 12 3
B21 12 4
B21 18 1
B21 18 2

using SAS EG. Right now all have is below:

Option 1:

 create table middle_legs as
    select t.*
    from actual_flt_leg as t
    where t.duty_perd_id < (select max(t2.duty_perd_id)
                            from actual_flt_leg as t2
                            where t2.rotn_prng_nbr = t.rotn_prng_nbr and
                                  t2.empl_nbr = t.empl_nbr
                           );

this works exactly as intended, but is incredibly slow. The other thought that I had but couldnt quite finish was as follows.

Option 2:

create table last_duty_day as
Select * from actual_flt_leg 
inner join (
    select actual_flt_leg.Rotn_Prng_Nbr,actual_flt_leg.empl_nbr, max(duty_perd_id) as last_duty
    from actual_flt_leg
    group by actual_flt_leg.Rotn_Prng_Nbr, actual_flt_leg.empl_nbr
) maxtable on 
    actual_flt_leg.Rotn_Prng_Nbr = maxtable.Rotn_Prng_Nbr
    and actual_flt_leg.empl_Nbr = maxtable.empl_Nbr
    and actual_flt_leg.duty_perd_id = maxtable.last_duty;

option 2 finds all the highest duty_perd_id for the given pair, and I was wondering if there was any "reverse join" that could only show the rows from the original table that do not match this new table i created in option 2.

If there is a way to make option 1 faster, finish option 2, or anything else i cant think of id appreciate it. Thanks!

2 Answers2

1

You are almost there. You just want <:

Select *
from actual_flt_leg inner join
     (select actual_flt_leg.Rotn_Prng_Nbr,actual_flt_leg.empl_nbr, max(duty_perd_id) as last_duty
      from actual_flt_leg
      group by actual_flt_leg.Rotn_Prng_Nbr, actual_flt_leg.empl_nbr
     ) maxtable 
     on actual_flt_leg.Rotn_Prng_Nbr = maxtable.Rotn_Prng_Nbr and
        actual_flt_leg.empl_Nbr = maxtable.empl_Nbr and
        actual_flt_leg.duty_perd_id < maxtable.last_duty;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I tend to prefer the `<` be in a `where` not the `on`, just because it's more descriptive of what you really want ("join these two things to get this value on the table, then select the rows where the value is less than the other value") - should work the same under the hood though I imagine. – Joe Jun 14 '21 at 16:07
1

In SAS SQL, this is pretty easy:

data have;
input rotn_prng_nbr $ empl_nbr duty_perd_id;
datalines;
B93 12 1
B93 12 2
B93 12 3
B21 12 1
B21 12 2
B21 12 3
B21 12 4
B21 18 1
B21 18 2
;;;;
run;

proc sql;
  select *
    from have
    group by rotn_prng_nbr, empl_nbr
    having duty_perd_id lt max(duty_perd_id);
quit;

This isn't legit SQL in any other system that I've ever seen, but it works in SAS. You can group by a set of variables while still using select for all of the variables including ones not on group by; SAS just does two queries and merges them behind the scenes for you.

NOTE: The query requires remerging summary statistics back with the original data.

As far as I understand the actual under the hood result is exactly identical to the more "compatible" version Gordon suggests; just a matter of whether you prefer typing less or more compatible SQL code.

Joe
  • 62,789
  • 6
  • 49
  • 67