0

I have a scatter plot of calls / time. My x variable is the date (Day/Month) and my Y variable is a number of calls on each date. I would like to plot two regression lines using PROC SGPLOT REG, one for 2019 and one for 2020. However, when I try to do this, all I get is a regular scatter plot with no regression lines. Here is my code:

 proc sgplot data=intern.bothphase1;
 reg x=date y=count / group=Year;
 label count="Calls Per Day" year="Year";
 Title "Comparison of EMS Calls per Day 1/1 - 3/31 in 2019 vs. 
 2020";
 run;

The scatter plot comes up without issue (2019 and 2020 values in different colors) but I want to see how the trends differed between the two time periods, so I really want to get the regression lines on there. Can anyone help?

I imagine this has to do with the fact that I concatenated my day and month with a / so it is a character variable and so SAS cannot calculate the regression. I did this so I could use year as a class variable. I still have the original date variable in my table, is there a way I could get SAS to give me the month/day from that as a numeric variable?

Thanks!

EDIT: I used a date value in SAS and changed the format to mm/dd, but this doesn't help because the regression lines are just on either end of the graph rather than overlapping (picture attached). what I want is to have the regression lines overlap for the same time period 2019 vs. 2020 This is because SAS dates correspond to numbers from 1/1/1960. What I want is the mm/dd to correspond to numbers 1-365 so I get two overlapping regression lines to show how the trends changed from one year to the next. Anyone know how I can do this?

keherder
  • 35
  • 5
  • You cannot do regression with character variables so you're right there. Convert it to a SAS date and apply a format to it. You can have a new variable that has the year or even restructure your data so that you have different years in different columns. – Reeza Oct 29 '21 at 03:04
  • A date value with format `mmddyy5.` will appear as `mm/dd`. So compute a date value if necessary, and use that for x. – Richard Oct 29 '21 at 13:36
  • Hi all, thanks for the help, but this didn't work. Since dates in SAS are number of days since 1/1/1960, changing the format didn't help. What I need the mm/dd to correlate to are numbers 1-365 so the regression lines overlap. Any ideas on how I could do this? I will attach a picture of my problem to the body of the question. – keherder Oct 30 '21 at 00:13

1 Answers1

0

So two steps here: first, you need to generate a "day" value that's 1-365... so let's just subtract out 01JAN from the day value.

data have;
  do date = '01JAN2019'd to '31DEC2020'd;
    count = 25+2*rand('uniform');
    year = year(date);
    if month(date) le 3 then output;
  end;  
  format date date9.;
run;

data adjusted;
  set have;
  date_fixed = date - intnx('year',date,0,'b') + 1;  *current date minus jan 1 plus 1 (otherwise off by 1);
  format date_fixed date5.;                          *this does not actually affect the graph axis, oddly;
run;


 proc sgplot data=adjusted;
 reg x=date_fixed y=count / group=Year;
 xaxis valuesformat=date5.;                   *this seems to be needed for some reason;
 label count="Calls Per Day" year="Year";
 Title "Comparison of EMS Calls per Day 1/1 - 3/31 in 2019 vs. 
 2020";
 run;

Then we add the xaxis line because for some reason it won't obey the DATE5. format (could also use MMDDYY5. as Reeza noted in comments, but we can force it to here.

Here is what I get. You can use other axis options to further limit things, so for example 01APR doesn't show up.

regression image showing blue and red lines over a scatterplot overlaid, with a single axis showing 01JAN-01APR)

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Thanks for all your help! I had some trouble with the do statement, however. It created a million observations from my original set of 900. Not sure what I did wrong? I ended up finding the following code to get my date variable: DayOfYear = date - intnx('year', date, 0, 'b') + 1; – keherder Nov 01 '21 at 05:05