0

So I have a relatively complex process which pulls down a ton of data from a netezza data warehouse. I was tasked with adding two new fields to the process and went through the usual rigamarole of adding the new fields to every sort, group, by, etc. and although the fields come through I consistently have my data run high by ~12%.

As a diagnostic tool I plugged in proc summaries after every clause that pulls in or merges new data and I isolated it down to two proc SQL statements

proc sql;
  create table ACTUALALL_CD2 as 
  select stay_y, timeframe, daysout, Stay_mo
  from ACTUALALL_CD
  where stay_y = &curyr
  group by stay_y, timeframe, daysout, Stay_mo
  order by daysout, Stay_mo;
quit;

proc sort data=ACTUALALL_CD2 nodupkey; 
  by stay_y timeframe daysout Stay_mo; 
run;

proc sql;
  create table ACTUALALL_CD3 as
  select 
    a.WeekN, a.timeframe, a.timen, a.week, a.daysout,
    a.wdwe,a.marsha, a.room_pool_cd, a.cd_tier_rpgm, a.month_cal_id,   
    a.market_prefix_cd , a.yrpd2, a.stay_y, b.Stay_mo,a.Curr_code,
    a.fxmonth, a.fxrate,a.yrpd, a.yrmon,  a.Month, 
    a.RN_TY as RNTY, a.rev_ty as RVTY, a.USDRVTY, a.RVTY_CDUS, 
    a.Holiday_date, a.RN_LY as RNLY,
    a.rev_ly as RVLY, a.USDRVLY, a.RVLY_CDUS
  from ACTUALALL_CD a
  inner join ACTUALALL_CD2 b 
    on b.daysout = a.daysout;
quit;

In my original code I lose one value from from the daysout field (value 7 in a field with values 1-18) (which I want to do) and that,in turn cuts my output value. In comparison this code (only differences between this and the legacy code are the added vales a.room_pool_cd and a.cd_tier_rpgm).

I need to figure out how to make this code cut out that value as well. I've run this by two guys with a combined ~40 years of sas experience and they came up short. Suggestions for edits or rewrites would be greatly appreciated.

halfer
  • 19,824
  • 17
  • 99
  • 186
Kumachorou
  • 37
  • 3
  • Can you show the code prior to your changes as well? Also, can you provide any notes and warnings (if any) from running these? And the number of observations being read in/output by each step. – Robert Penridge Jun 23 '16 at 17:36
  • Check for dups on daysout var. In both datasets. – Reeza Jun 23 '16 at 20:28

1 Answers1

0

Nothing looks unusual in the supplied queries, so I suggest the following just to make sure:

  1. Restart your SAS session. Make sure that nothing strange is interacting with prior runs.
  2. Make sure you didn't accidentally press the F4 key and copy/paste any submitted code into your editor window by accident (if you are using the Base SAS editor).
  3. Search the log for the words error: or warning: or repeats of BY values.

If none of these work, then the real way to solve it is to identify keys that have different counts (between the original query and the new). Restrict both the original and new queries to just (one of) those keys and start debugging from there. Remove stuff from the new query until the number of records gives the expected result, then you have found your problem. This is the one guaranteed way to isolate and fix the problem, but is not the quick fix people are always looking for.

Robert Penridge
  • 8,424
  • 2
  • 34
  • 55