Given the following test data:
data test;
input A B;
cards;
1 2
1 1
1 2
run;
NOTE: The data set WORK.TEST has 3 observations and 2 variables.
I am aware that proc sort can behave unexpectedly if you don't sort by a whole key, or even when you appear to sort by the whole key but there is a keep statement in force:
proc sort data=test out=test_dedup_works nodup;
by a _all_;
run;
NOTE: There were 3 observations read from the data set WORK.TEST.
NOTE: Duplicate BY variable(s) specified. Duplicates will be ignored.
NOTE: 1 duplicate observations were deleted.
NOTE: The data set WORK.TEST_DEDUP_WORKS has 2 observations and 2 variables.
proc sort data=test out=test_dedup_fails nodup;
by a;
run;
NOTE: There were 3 observations read from the data set WORK.TEST.
NOTE: 0 duplicate observations were deleted.
NOTE: The data set WORK.TEST_DEDUP_FAILS has 3 observations and 2 variables.
proc sort data=test (keep=a) out=test_dedup_alsofails nodup;
by a;
run;
NOTE: There were 3 observations read from the data set WORK.TEST.
NOTE: 0 duplicate observations were deleted.
NOTE: The data set WORK.TEST_DEDUP_ALSOFAILS has 3 observations and 1 variables.
What's new to me is that trying to deduplicate the resulting not-actually-deduplicated dataset using PROC SQL also fails to remove duplicates:
proc sql;
create table test_dedup_eventhisfails as
select distinct a
from test_dedup_alsofails;
quit;
NOTE: Table WORK.TEST_DEDUP_EVENTHISFAILS created, with 3 rows and 1 columns.
Is this a bug which is documented somewhere or am I doing something wrong?