So I have a relatively complex process which pulls down a ton of data from a netezza data warehouse. I was tasked with adding two new fields to the process and went through the usual rigamarole of adding the new fields to every sort, group, by, etc. and although the fields come through I consistently have my data run high by ~12%.
As a diagnostic tool I plugged in proc summaries after every clause that pulls in or merges new data and I isolated it down to two proc SQL statements
proc sql;
create table ACTUALALL_CD2 as
select stay_y, timeframe, daysout, Stay_mo
from ACTUALALL_CD
where stay_y = &curyr
group by stay_y, timeframe, daysout, Stay_mo
order by daysout, Stay_mo;
quit;
proc sort data=ACTUALALL_CD2 nodupkey;
by stay_y timeframe daysout Stay_mo;
run;
proc sql;
create table ACTUALALL_CD3 as
select
a.WeekN, a.timeframe, a.timen, a.week, a.daysout,
a.wdwe,a.marsha, a.room_pool_cd, a.cd_tier_rpgm, a.month_cal_id,
a.market_prefix_cd , a.yrpd2, a.stay_y, b.Stay_mo,a.Curr_code,
a.fxmonth, a.fxrate,a.yrpd, a.yrmon, a.Month,
a.RN_TY as RNTY, a.rev_ty as RVTY, a.USDRVTY, a.RVTY_CDUS,
a.Holiday_date, a.RN_LY as RNLY,
a.rev_ly as RVLY, a.USDRVLY, a.RVLY_CDUS
from ACTUALALL_CD a
inner join ACTUALALL_CD2 b
on b.daysout = a.daysout;
quit;
In my original code I lose one value from from the daysout field (value 7 in a field with values 1-18) (which I want to do) and that,in turn cuts my output value. In comparison this code (only differences between this and the legacy code are the added vales a.room_pool_cd and a.cd_tier_rpgm).
I need to figure out how to make this code cut out that value as well. I've run this by two guys with a combined ~40 years of sas experience and they came up short. Suggestions for edits or rewrites would be greatly appreciated.