SAS: using group by in proc sql doesn't separate out instances chronologically

Question

Consider the following SAS code:

data test;
    format dt date9.
           ctry_cd $2.
           sn $2.;
    input ctry_cd sn dt;
    datalines;
    US 1 20000
    US 1 20001
    US 1 20002
    CA 1 20003
    CA 1 20004
    US 1 20005
    US 1 20006
    US 1 20007
    ES 2 20001
    ES 2 20002
    ;
run;

proc sql;
    create table check as
    select
        sn,
        ctry_cd,
        min(dt) as begin_dt format date9.,
        max(dt) as end_dt format date9.
    from test
    group by sn, ctry_cd;
quit;

This returns:

1 CA 07OCT2014 08OCT2014
1 US 04OCT2014 11OCT2014
2 ES 05OCT2014 06OCT2014

I would like for the proc sql distinguish between the country moves; that is, return

1 US 04OCT2014 06OCT2014
1 CA 07OCT2014 08OCT2014
1 US 09OCT2014 11OCT2014
2 ES 05OCT2014 06OCT2014

So it still groups the instances by sn and ctry_nm but pays attention to the date so I have a timeline.

score 1 · Accepted Answer · answered Apr 28 '16 at 23:38

You need to create another grouping variable then:

data test;
  set test;
  prev_ctry_cd=lag(ctry_cd);
  if prev_ctry_cd ^= ctry_cd then group+1;
run;

proc sql;
    create table check as
    select 
        sn,
        ctry_cd,
        min(dt) as begin_dt format date9.,
        max(dt) as end_dt format date9.
    from test
    group by group,  sn, ctry_cd
    order by group;
quit;

score 0 · Answer 2 · answered Apr 29 '16 at 08:41

If the data is sorted as per your example, then you can achieve your goal in a data step without creating an extra variable.

data want;
keep sn ctry_cd begin_dt end_dt; /* keeps required variables and sets variable order */
set test;
by sn ctry_cd notsorted; /* notsorted option needed as ctry_cd is not in order */
retain begin_dt; /* retains value until needed */
if first.ctry_cd then begin_dt=dt; /* store first date for each new ctry_cd */
if last.ctry_cd then do;
    end_dt=dt; /* store last date for each new ctry_cd */
    output; /* output result */
end;
format begin_dt end_dt date9.;
run;

SAS: using group by in proc sql doesn't separate out instances chronologically

2 Answers2