At work I have a 200M rows (300+ columns, ~200GB) claims table that is indexed by member number (memno) and date of service (dos).
Twice a month I have to pull 50,000 member number's claims from a fixed date range (say 1/1/2017 to 5/1/2018), where I only need a limited columns from claims.
MyInputList only has 1 column (memno).
proc sql;
create table myClaims as
select a.claimno, a.dos, a.memno
from s.claims a inner join myInputList b
on a.memno = b.memno
where a.dos between '01Jan2017'd and '01May2018'd;
quit;
It usually takes about 3-4 hours to run by PROC SQL. The data themselves are not hosted by a RDMS, I read many SAS essay that PROC SQL does a cartesian product and since I don't need all 300 columns per record, I wonder if it will be better using hash table.
My question: Can I give "hints" to the hash table so it can leverage the indexed columns (memno, dos)
?
data myClaimsTest (drop=rc);
if 0 then set myInputList;
declare hash vs(hashexp:7, dataset:'myInputList');
vs.definekey('memno');
vs.definedata();
vs.definedone();
do until (eof);
set s.claims (keep=claimno dos) end=eof;
if vs.find()=0 then output;
end;
stop;
run;
New section (added by Richard)
Run this code to get a listing of the variables and the indices.
dm "clear output"; ods listing; ods noresults; options nocenter; title;
proc contents varum data=all_claims;
run;
dm "output" output; ods results;
Copy and paste the bottom of the output here. Replace this example with your actual listing.
Variables in Creation Order
# Variable Type Len Format
1 claim_id Num 8
2 member_id Num 8
3 claim_date Num 8 YYMMDD10.
Alphabetic List of Indexes and Attributes
# of
Unique Unique
# Index Option Values Variables
1 PICK YES 333338 member_id claim_date