0

The data I have looks like the picture below. I have the date, the customer name, and the different categories of fruit that they bought.

Example

I want to create a new column that count how many "double sold" occurred. For example, first interaction with customer A is on 1/15, if the second second interaction is within 7 days of the first interaction and a fruit is sold, I want it to put a 1 in the "double sold" column.

Essentially, I want SAS to match the customer ID, and then go down the rows and look for dates that are within 7 days of the first one.

Joe Chan
  • 133
  • 3
  • 11
  • What have you tried thus far? It'll help others if you add code for the sample data that others can use without having to program the `datalines` themselves, please. Also why the `sql` and `mysql` tags? Does your solution require these? – Snorex Feb 16 '17 at 16:37
  • Thanks for the comment, I have tried breaking the data apart and move all of the dates with the same ID onto the same row, so that I can use a case when to check along the row, but that wasn't successful because the count of A's is not the same as the count of B's. – Joe Chan Feb 16 '17 at 16:52

1 Answers1

0

Question: why does ID=C have DOUBLE_SOLD=1 in your example? There is no other ID=C to compare with. In any case, you'll want to sort by ID date and then lag DATE by ID to make the comparison using simple logic:

proc sort data = have;
    by ID DATE;
run;

** edit sales to 1/0 **;
data have1; set have;
    array a(3) APPLE ORANGE PEAR;
    do i=1 to dim(a);
        if a(i)>0 then a(i)=1;
    end;
    drop i;
run;

data want; set have1;
    by ID;
    LAG_DATE=lag(DATE);
        if first.ID then LAG_DATE = .;
        if LAG_DATE ne . then do;
            if (DATE-LAG_DATE)<=7 and sum(APPLE,ORANGE,PEAR)>0 then DOUBLE_SOLD=1;
                else if (DATE-LAG_DATE)>7 or sum(APPLE,ORANGE,PEAR)=0 then DOUBLE_SOLD=0;
        end;
        else do;
            if sum(APPLE,ORANGE,PEAR)>1 then DOUBLE_SOLD=1;
                else DOUBLE_SOLD=0;
        end;
    format LAG_DATE mmddyy10.;
run;
kstats9pt3
  • 799
  • 2
  • 8
  • 28
  • Edited to add condition that if same ID but > 7 days then `DOUBLE_SOLD=0` – kstats9pt3 Feb 16 '17 at 18:56
  • ID = C has double sold, because it sold an apple and a pear. I already have a working solution for that, so I did not clarify that in the original statement, but if more than 1 fruit are sold on the same day, we also want to set double sold = 1. – Joe Chan Feb 16 '17 at 19:01
  • Now you'll want to add a step to make any sales > 1 just equal to 1 so the sum is counting how many different fruits sold. See how that works for you. – kstats9pt3 Feb 16 '17 at 19:11
  • Note that in your last `else do;` statement you'll want the sum > 1 instead of > 0 to indicate more than one fruit sold. – kstats9pt3 Feb 16 '17 at 19:21
  • This code that you are using, is slightly different from the ones I am using. To sort, I would just ORDER BY ID, DATE; What does the very first line, =have do? – Joe Chan Feb 16 '17 at 21:39
  • `have` is just a placeholder for what your actual dataset is named. You'll see it on this site often. If your dataset is named `fruits` it'll be `proc sort data = fruits;`. – kstats9pt3 Feb 16 '17 at 21:41
  • This is perfect! Thank you so much! – Joe Chan Feb 17 '17 at 00:50