0

I am looking to find the average of valueZ for each individual based on their unique start and end date. Exposure X has numerous values for each day, for each location, as such (There are actually 23 locations with over 300 values & dates per site):

data dataset2;
input date location valueZ;
datalines;
1/1/2016 1 0.028
1/1/2016 1 0.022
...
2/8/2016 1 0.041
2/8/2016 1 0.044
1/1/2016 2 0.056
...
8/8/2016 2 0.089
1/1/2016 3 0.029
...
11/8/2016 3 0.083
...
1/1/2016 4 0.081
...
12/8/2016 4 0.019
...
10/30/2016 23 0.063
;

The individuals in dataset1 are linked to dataset 2 by the location as such:

data dataset1;
input individual location start_date end_date;
datalines;
1 1 1/1/2016 12/31/2016
2 1 3/12/2016 9/4/2016
3 2 2/5/2016 11/5/2016
4 19 9/30/2016 10/3/2016
5 23 4/12/2016 12/12/2016
...
305 16 1/20/2016 5/15/2016
;

So, I would like to end up with the average of valueZ based on the location indicated in dataset2 from start_date to end_date for each individual. Can someone please help!

Danielle
  • 13
  • 5

1 Answers1

0

Something like this?

proc sql;
  create table want as
  select d1.individual
        ,d1.location
        ,avg(d2.valueZ) as avg_value
  from dataset2 d2
  join dataset1 d1
    on d1.location=d2.location
    and d2.date between d1.start_date and d2.end_date
  group by d1.individual, d1.location
quit;
user2877959
  • 1,792
  • 1
  • 9
  • 14
  • Yes! This is so much cleaner than what I was trying to do in IML and it worked wonderfully. Thank you! – Danielle Sep 29 '17 at 18:18
  • Is there a way to produce the number of days that were missing values within each average? – Danielle Sep 29 '17 at 18:23
  • Dates with missing values of VALUEZ? Dates that do not appear in the data? You might try COUNT(DISTINCT d2.date) to see how many different date values you have. You could then subtract from number of days from start to end to see how many where missing. – Tom Sep 29 '17 at 18:39
  • perfect. Thank you so much for your help. – Danielle Sep 29 '17 at 19:01