0

We are in the process of transitioning our code to the SAS Grid where we use Enterprise Guide 6.1 to run SAS 9.4. One piece of code is giving different results than what we get when running SAS 9.3 on PC. I've created an example dataset and code which returns varying results. The results of running on the Grid (13 rows with duplicates) are the expected behavior when including a column in the select statement, but not aggregating it or using it in the group by. The results of running it using SAS 9.3 (6 distinct rows) are what we want, but not typical SAS behavior. I have since modified the code (added denom to the group by) to get the desired results on the Grid, but wonder why the code returns different results when run in different environments. Any ideas?

ETA: the Grid uses SAS 9.4 and the modified code uses denom in the group by. Also, I was unable to find documentation of changes to Proc SQL from V 9.3 to V 9.4.

proc sql;
  create table work.test
    (state char(2)
    ,county char(20)
    ,city char(20)
    ,id char(6));    
quit;

proc sql;
  insert into work.test (state, county, city, id)
  values ('OH', 'Hamilton', 'Cincinnati', 'abc')
  values ('OH', 'Hamilton', 'Cincinnati', 'def')
  values ('OH', 'Hamilton', 'Cincinnati', 'ghi')
  values ('OH', 'Hamilton', 'Mariemont', 'jkl')
  values ('OH', 'Hamilton', 'Mariemont', 'mno')
  values ('OH', 'Franklin', 'Columbus', 'pqr')
  values ('OH', 'Franklin', 'Columbus', 'stu')
  values ('TX', 'San Patricio', 'Ingleside', 'abc')
  values ('TX', 'San Patricio', 'Taft', 'abc')
  values ('TX', 'Nueces', 'Corpus Christi', 'abc')
  values ('TX', 'Nueces', 'Corpus Christi', 'xyz')
  values ('TX', 'Nueces', 'Corpus Christi', 'tuv')
  values ('TX', 'Nueces', 'Corpus Christi', 'def');
quit;

proc sql;
  create table freqs as
  select a.state
    , a.county
    , a.city
    , count(city) as numer
    , denom
    , round(count(city)/denom*100,.1) as percent
  from work.test as a,
  (select state, county, count(*) as denom from work.test group by state, county) as b
  where a.state=b.state and a.county=b.county
  group by a.state, a.county, a.city;
quit;
  • 1
    Raise a ticket with SAS support. – Tom May 26 '16 at 20:53
  • See the edit to my answer. Looks like a known issue that was fixed in a 9.3 Hot Fix: http://support.sas.com/kb/46/832.html – DomPazz May 26 '16 at 21:05
  • Note that SAS 9.2 does the same thing as SAS 9.3. It is SAS 9.4 that has lost the knowledge the DENOM is at the STATE,CITY level and so does NOT need to be remerged. – Tom May 26 '16 at 21:08
  • I wouldn't say it "lost the knowledge", 9.4 operates how the ANSI standard would have it work. They made the procedure more correct. – DomPazz May 26 '16 at 21:28

1 Answers1

1

At the end of the day, your query is poorly written.

If you look at the log, you will see this note:

NOTE: The query requires remerging summary statistics back with the original data.

The reason for this is that denom is not a summary variable. This is causing your extra records. This should be expected behavior of your query - 13 rows with dups.

Order of processing is to join the tables, and then perform summary calculations. When the GROUP BY statement is not complete, you get that NOTE:

Add b.denom to the group by and you get your expected 6 rows.

As to the WHY SAS gives you different results: I suspect the version of SAS on the GRID is different than what you have on the PC.

EDIT: I think this is the SAS Note that details the problem. They fixed it in a hot fix in 9.3 and would have been part of 9.4 out of the box: http://support.sas.com/kb/46/832.html

DomPazz
  • 12,415
  • 17
  • 23
  • Yes. I agree. I would have written the code differently myself. – Felicia Burke May 26 '16 at 19:41
  • I did see that Note when running on the Grid which is using SAS 9.4. That is what I would have expected. I have already modified the code to include denom in the group by for running on the Grid. I looked through the SAS documentation for 9.4 and did not find mention of any changes to Proc SQL. Though there is now FedSQL. I'm just perplexed that SAS 9.3 didn't generate that Note and only output 6 rows. This is not typical behavior for 9.3. – Felicia Burke May 26 '16 at 19:51
  • Thanks! That is why I didn't see it in the documentation for 9.4. I will have our admin apply the Hot Fix (and check for others). – Felicia Burke May 27 '16 at 15:01