0

The data structure is:

State County Date AQI Pollutant
Alabama Baldwin 5/2/2017 68 ozone
Alabama Baldwin 6/4/2017 102 PM2.5
Alabama Dekalb 6/6/2017 105 PM10

I am trying to count days/year where AQI is >100, by pollutant, county, and state, such that the output looks like:

State County ozone PM2.5 PM10
Alabama Baldwin
Alabama Dekalb

Where the table values represent the # of days/year with AQI>100 for that state and county, by each pollutant. Each state and county combination should only appear once per row.

Is there a straightforward way to do this in SAS without subsetting the data? I have tried some combinations of proc sql and proc report, but can't produce what I'm looking for. ALSO, is there a way to evaluate if there are multiple pollutants with AQI>100 on the same date for the same county and state? For example, in the first table above, if there was an observation for Alabama, Baldwin, 6/4/2017, 107, ozone.

mennit
  • 3
  • 3

2 Answers2

1

Use can use Proc TABULATE with CLASS Pollutant or Proc REPORT with DEFINE Pollutant/ACROSS

Example:

Classdata is every combination and used with tabulate to ensure every county appears in the output, even when they never have aqi>100

proc sql;
  create table everycombo as
  select state, county, pollutant
  from
  ( select distinct state, county from have )
  cross join
  ( select distinct pollutant from have )
  ;


proc tabulate data=have classdata=everycombo;
  class state county;
  class pollutant / order=internal;
  var aqi;

  table state*county , pollutant * N ;

  where aqi > 100;
run;

REPORT Example:

In order to show all counties, even those with persistently low AQI, a binary flag value is calculated. The sum of values is the number of events.

Compute statements are used to propagate the state in all report rows.

data report;
  set have;
  if aqi > 100 then aqiflag=1;
run;

proc report data=report;

where cats(state) = '1';

  columns state rstate county aqiflag,pollutant;
  define state / group noprint;
  define county / group ;
  define rstate / computed;
  define pollutant / ' ' across;
  define aqiflag / ' ' sum;          /* sum the flag */
  compute before state;
    stateg = state;                  /* capture value */
  endcomp;
  compute rstate;
    rstate = stateg;                 /* propagate */
  endcomp;
run;
Richard
  • 25,390
  • 3
  • 25
  • 38
  • In the TABULATE example above, how would I ammend to output "0" if aqi <100? These values currently show as missing. I want to retain values as missing if there are no values for that pollutant, but change to 0 if there are aqi values <100 for that pollutant. – mennit May 10 '23 at 16:04
  • You can use a custom format. Something like `proc format; value aqirpt low-<100 = 0;` and in the tabulate add `format aqi aqirpt;` – Richard May 10 '23 at 23:59
0

Here is enough to get you started.

  • Sample data in a data step
  • Use PROC MEANS to summarize data on the same day, in this case it takes the average value for that day
  • Summarize the value for each metric using SQL to count the number of days of observations
  • Transpose data into a reporting format
data have;
infile cards dlm=' ' dsd truncover;
informat state $20. County $20. Date mmddyy10. AQI 8. Pollutant $20.;
input State County  Date    AQI Pollutant;
cards;
Alabama Baldwin 5/2/2017 68 ozone
Alabama Baldwin 6/4/2017 102 PM2.5
Alabama Baldwin 5/4/2017 75 ozone
Alabama Baldwin 7/4/2017 88 PM2.5
Alabama Dekalb 6/6/2017 105 PM10
Alabama Dekalb 6/6/2017 107 PM10
;;;;
run;

proc means data=have noprint nway;
class state county pollutant date;
var aqi;
output out=daily_data mean(aqi)=;
run;

proc sql;
create table summary as 
select state, county, pollutant, sum(aqi>100) as nDaysGT100, count(*) as nDays, calculated nDaysGT100/calculated nDays as pct_over_100
from daily_data
group by state, county, pollutant;
quit;

proc transpose data=summary out=want;
by state county;
id pollutant;
var nDaysGT100;
run;

proc transpose data=summary out=want_pct;
by state county;
id pollutant;
var pct_over_100;
run;
Reeza
  • 20,510
  • 4
  • 21
  • 38