0

I got stuck in one task.

Now I have a variable called placement which has two categories, out of home and at home (A and B for short here). A case can be placed at AAABAB sequentially. Each placement has a starting date and ending date, for instance, 01/20/2015-2/21/2015 for first A, 2/21/2015-8/05/2015 for second A....etc.

Now I structured data like this, 111.1. (A as 1 and B as missing). How can I combine out-of-home placement time? That is, use the first A's starting date as starting date, use the third A's ending date as ending date. This will be the first period of time that the case is out-of-home. I am thinking about using a do loop to do this. But cannot figure out how.

I think, another way to do this, is to identify out-of-home groups, for example, AAABAABAAAA can be labeled as 11123345555. Then I can use PROC SQL, min and max function, to find starting date and ending date.

proc sql; create table comb as select *,min(strtdt) as minstrtdt, max(livarenddt) as maxenddt from limt group by caseid,groupid; quit;

Original data, like this, (out indicates out of home, 1 means yes, . means no)

Obs caseid livarstrtdt livarenddt placemnt out

   81    00040903     14SEP2010     09DEC2010     01                   .
   82    00040903     09DEC2010     28FEB2011     02                   1
   83    00040903     28FEB2011     01APR2011     02                   1
   84    00040903     01APR2011     01JUL2011     02                   1
   85    00040903     01JUL2011     08AUG2012     02                   1
   86    00040903     08AUG2012     05NOV2014     02                   1
   87    00040903     05NOV2014     05NOV2014     03                   .
   88    00040903     12AUG2008     13AUG2008     12                   1
   89    00040903     13AUG2008     13AUG2008     01                   .
   90    00040903     13AUG2008     21AUG2008     01                   .

I want the output like this,

Obs caseid livarstrtdt livarenddt placemnt out

   81    00040903     14SEP2010     09DEC2010     01                   .
   82    00040903     09DEC2010     05NOV2014     02                   1

   87    00040903     05NOV2014     05NOV2014     03                   .
   88    00040903     12AUG2008     13AUG2008     12                   1
   89    00040903     13AUG2008     13AUG2008     01                   .
   90    00040903     13AUG2008     21AUG2008     01                   .

Anyone have ideas? Thanks!!

UPDATE,

Now I modified my data, caseid livaropnseq livarstrtdt livarenddt placemnt out

183  00040903      01        14SEP2010   09DEC2010   01                 .
184  00040903      01        09DEC2010   28FEB2011   02                 1
185  00040903      01        28FEB2011   01APR2011   02                 1
186  00040903      01        01APR2011   01JUL2011   02                 1
187  00040903      01        01JUL2011   08AUG2012   02                 1
188  00040903      01        08AUG2012   05NOV2014   02                 1
189  00040903      01        05NOV2014   05NOV2014   03                 .
190  00040903      02        12AUG2008   13AUG2008   12                 .
191  00040903      02        13AUG2008   13AUG2008   02                 1
192  00040903      02        13AUG2008   21AUG2008   02                 1

As you can see, rows of different livaropnseq should not be combined together. For example Row 189 and 190, even though the case was at home at both time period, they belong to different livaropnseq (different open sequences). How can I incorporate this variable into the code in order to avoid collapsing them together? Greatly appreciated!!!

The output should be like this, Obs caseid livaropnseq livarstrtdt livarenddt placemnt out

183  00040903      01        14SEP2010   09DEC2010   01                 .
184  00040903      01        09DEC2010   05NOV2014   02                 1
189  00040903      01        05NOV2014   05NOV2014   03                 .
190  00040903      02        12AUG2008   13AUG2008   12                 .
191  00040903      02        13AUG2008   21AUG2008   02                 1

UPDATE AGAIN, i USED THE following code to get what I want FINALLY!

data spell6;
set spell4;
by caseid livaropnseq out notsorted ;
group + first.out;
if first.caseid then group=1;
if first.out then startdt=livarstrtdt ;
retain startdt;
if last.out;
enddt = livarenddt ;
format startdt enddt date9.;
*drop livarstrtdt livarenddt ;
run;

Thank you so much, EVERYONE!! I really appreciated it!

Sophie
  • 1
  • 3

3 Answers3

0

You can do this in a DATA step with the help of RETAIN and OUTPUT:

DATA dset2 (KEEP=caseid new_start new_stop new_out);
  SET dset1;
  BY caseid;
  RETAIN new_start new_stop new_out current_pl;
  IF first.caseid THEN DO;
    new_start=livarstrtdt;
    new_stop=livarenddt;
    new_out=out;
    current_pl=placemnt;
  END;
  ELSE IF placemnt=current_pl THEN new_stop=livarenddt;
  ELSE DO;
    OUTPUT;
    new_start=livarstrtdt;
    new_stop=livarenddt;
    new_out=out;
    current_pl=placemnt;
  END;
  IF last.caseid THEN OUTPUT;
RUN;

Let me know if I misunderstood. My output would not include obs 89 which looks like it shouldn't have been in the output.

EDIT: I updated the code to keep the "out" variable. Below is my output:

caseid   new_start new_stop  new_out 
00040903 14SEP2010 09DEC2010 . 
00040903 09DEC2010 05NOV2014 1 
00040903 05NOV2014 05NOV2014 . 
00040903 12AUG2008 13AUG2008 1 
00040903 13AUG2008 21AUG2008 .

If this is not what you want, please be more clear.

Dave O
  • 19
  • 4
  • But for some reason, I got the following output – Sophie Oct 21 '16 at 22:18
  • Obs caseid new_start new_stop out 47 00040903 14SEP2010 09DEC2010 1 48 00040903 09DEC2010 05NOV2014 . 49 00040903 05NOV2014 05NOV2014 1 50 00040903 12AUG2008 13AUG2008 . 51 00040903 13AUG2008 21AUG2008 . – Sophie Oct 21 '16 at 22:22
  • The dates seem not correct, and also the first record disappeared, which indicate the case is at home...I am sorry, I donot know how to make output look organized.... – Sophie Oct 21 '16 at 22:24
  • @Sophie put your modifications in your original question rather than the comment and indicate that you've updated it. – Reeza Oct 22 '16 at 00:21
  • The output looks exactly the same as what I WANT! Thank you! – Sophie Oct 22 '16 at 20:54
  • So only the newly created variables are kept in the final table. I included old variables, and compared them with newly created variable. The old variables are not correct anymore, right? – Sophie Oct 22 '16 at 20:56
  • Right. If you need other variables, you can modify the DATA step or JOIN back with the original table. – Dave O Oct 23 '16 at 02:31
  • @DaveO Thanks!!! I compared your codes and Tom's codes. For this case, the results are the same. but When I apply to all my cases, the results are different. I am trying to figure out why. The logic behinds those codes seem correct to me...Do you think so? Thank you!!! – Sophie Oct 24 '16 at 00:56
  • I would guess it's the order of the source data. For both codes, the data should be sorted by caseid then livarstrtdt. – Dave O Oct 24 '16 at 19:12
0

You can use the NOTSORTED option on a BY statement in your DATA step to let SAS figure out when they change location. You didn't say what to do with the extra variable PLACEMNT so this code will just keep the value that it had on the last observation in the group.

data have;
  input caseid $ livarstrtdt livarenddt placemnt $ out ;
  informat livarstrtdt livarenddt date9. ;
  format livarstrtdt livarenddt date9. ;
cards;
00040903 14SEP2010 09DEC2010 01 .
00040903 09DEC2010 28FEB2011 02 1
00040903 28FEB2011 01APR2011 02 1
00040903 01APR2011 01JUL2011 02 1
00040903 01JUL2011 08AUG2012 02 1
00040903 08AUG2012 05NOV2014 02 1
00040903 05NOV2014 05NOV2014 03 .
00040903 12AUG2008 13AUG2008 12 1
00040903 13AUG2008 13AUG2008 01 .
00040903 13AUG2008 21AUG2008 01 .
;;;;

data want ;
  set have ;
  by caseid out notsorted ;
  group + first.out;
  if first.caseid then group=1;
  if first.out then startdt=livarstrtdt ;
  retain startdt;
  if last.out;
  enddt = livarenddt ;
  format startdt enddt date9.;
  drop livarstrtdt livarenddt ;
run;

Results

Obs     caseid     placemnt    out    group      startdt        enddt
 1     00040903       01        .       1      14SEP2010    09DEC2010
 2     00040903       02        1       2      09DEC2010    05NOV2014
 3     00040903       03        .       3      05NOV2014    05NOV2014
 4     00040903       12        1       4      12AUG2008    13AUG2008
 5     00040903       01        .       5      13AUG2008    21AUG2008
Tom
  • 47,574
  • 2
  • 16
  • 29
  • Thank you very much!!! It's extremely helpful! I can get the output I want. The only thing I also want is, can I keep the value of the first observation if out is missing (case at home)? – Sophie Oct 22 '16 at 20:20
  • Or, when case is at home, we keep all rows, and only combine rows that case is not at home. In the example, there would be 6 rows in the results. Thank you so much!!! . – Sophie Oct 22 '16 at 20:22
0
data spell6;
set spell4;
by caseid livaropnseq out notsorted ;
group + first.out;
if first.caseid then group=1;
if first.out then startdt=livarstrtdt ;
retain startdt;
if last.out;
enddt = livarenddt ;
format startdt enddt date9.;
*drop livarstrtdt livarenddt ;
run;

THANK YOU, TOM! and everyone else!!!

Sophie
  • 1
  • 3