2

My data is currently in the form:

 ID     Fill1          Fill2         Fill3         Fill4          Fill5     
 1      01JAN2014      28JAN2014     26FEB2014     .              .
 2      .              05FEB2012     03MAR2012     02APR2012      01MAY2012
 3      10MAR2015      08APR2015     07MAY2015     05JUN2015      03JUL2015
 4      .              .             20FEB2013     18MAR2013      .         

And I am trying to create treatment "episodes" per ID. In other words, for each ID I want to find the first and last non-empty Fills and then calculate the difference between the two dates. For example for ID=1 I need to find the time difference between 01JAN2014 and 26FEB2014. That is,

Fill1 - Fill3 = episodeduration

but for ID=4 I would need to find,

Fill3 - Fill4 = episodeduration

where episodeduration is a new variable created. I have over 30k unique IDs with varying "first" and "last" Fill dates. Thanks in advance for you help.

Justin
  • 55
  • 2
  • 7
  • Have you made any attempt to solve this problem yourself? If so, what have you tried? This site does not aim to provide a code-on-request service. – user667489 Sep 22 '15 at 19:03
  • 1
    Look at min/max functions – Reeza Sep 22 '15 at 19:27
  • @user667489 I am well aware of what this site's aim is. You seem to have negative reaction to many of my posts, whereas 99% of others are engaging and happy with my questions. That said, I do not feel obligated to answer to the mildly offensive question you have posed. G'day. – Justin Sep 22 '15 at 19:50
  • @Reeza thank you for the feedback..I will check out min/max functions and see where I can get. I appreciate it. – Justin Sep 22 '15 at 19:51
  • 1
    @Justin I hope you don't take offense to user667489's comment as I think they are just trying to be helpful. The truth is that the SAS community on SO is fairly small and you can often get a quicker and more specific answer if you post any code you've tried. If you haven't written any code to solve your problem, that's fine. As long as you've clearly identified your problem and made a reasonable effort to find a solution before posting, your question is welcome. – DWal Sep 23 '15 at 03:08

3 Answers3

2
data have;
input Id Fill1 date9. Fill2 date9. Fill3 date9. Fill4 date9. Fill5 date9.;
format Fill1 -  Fill5 date9.;
cards;
1 01JAN201428JAN201426FEB2014 
2          05FEB201203MAR201202APR201201MAY2012
3 10MAR201508APR201507MAY201505JUN201503JUL2015
4                   20FEB201318MAR2013 
;
run;

data want;
set have;
    array fill {5};
    format first last date9.;

    do i = 1 to dim(fill);
       first=coalesce(first, fill(i));
       last=coalesce(fill(i), last);
    end;

    episodeduration = last - first;

    drop i;
run;

Use array statement to create array and loop through variables and coalesce() function to find first/last non missing.

Comment: this code will find first/last by going from first to last variable. If you need first/last in terms of dates, min and max functions are good: min(of fill1 -- fill5); - no need to loop.

vasja
  • 4,732
  • 13
  • 15
  • hi @vasja thank you so much, this is the exact logic I was looking for. Your example works perfect, however when i go to run this same code on my dataset I am getting empty values everywhere in the output (fill1, fill2,...first, last, episodeduration. In my actual dataset, "ID" is not 1, 2, 3 but rather a long patient identifier such as 00023432. Could this be the reason? – Justin Sep 22 '15 at 20:26
  • the only note appearing in the log is: "missing values were generated as a result of performing an operation on missing values. Each place is given by: (number of times) at (line):(column)." – Justin Sep 22 '15 at 20:36
  • You mean running just second data step on your data? Make sure your variables are exactly fill1, fill2... fill5 and they are numeric. In case they are character you'd better convert them to SAS date numeric first. – vasja Sep 23 '15 at 19:59
1

vasja's SAS version looks pretty nice, here's how it could be done SQL side (which is pretty much exactly the same procedure).

Select *, 
    DATEDIFF(day, 
         CONVERT(date,COALESCE(date1, date2, date3, date4, date5)),
         CONVERT(date, COALESCE(date5,date4,date3,date2,date1))
    )
from SomeTableNameAboutEpisodes;

Basically, you use coalesce to find the first non-null value, and you convert it into a date. You then take the difference between the 2 dates. This however only works if the empty cells have no values (null) and that there is no empty line. (you could simply put an ISNULL(DATEDIF(...), 0) though).

Julien Blanchard
  • 825
  • 5
  • 18
0

You can use a descending "SAS Variable List" (FILL5-FILL1) to make this a bit easier.

data diff;
   set have;
   first = coalesce(of fill1-fill5);
   i     = whichn(first,of fill1-fill5);
   last  = coalesce(of fill5-fill1);
   j     = 6-whichn(last, of fill5-fill1);
   format first last date9.;
   run;

Obs    Id        Fill1        Fill2        Fill3        Fill4        Fill5        first    i         last    j

 1      1    01JAN2014    28JAN2014    26FEB2014            .            .    01JAN2014    1    26FEB2014    3
 2      2            .    05FEB2012    03MAR2012    02APR2012    01MAY2012    05FEB2012    2    01MAY2012    5
 3      3    10MAR2015    08APR2015    07MAY2015    05JUN2015    03JUL2015    10MAR2015    1    03JUL2015    5
 4      4            .            .    20FEB2013    18MAR2013            .    20FEB2013    3    18MAR2013    4
data _null_
  • 8,534
  • 12
  • 14