0

I need to join a table Small with a table Large in Teradata DBMS. I select small.A, B, C, D 4 columns into macro variables, but the problem is the varables will exceed buffer size often times. So, I googled the code (http://support.sas.com/techsup/technote/ts553.html ) below that is to run SQL by chunk say every 105 records. Now I have two problems: 1. the line "file temp;" seems to be not working for me. The error is: ERROR: Insufficient authorization to access /x/sas/config/Lev1/SASApp/temp.dat. 2. the example has only one column to join, while i have 4 columns A-D to join. Can someone please help me? I appreciate your help!

%let chunk=105;

 proc sql;
   create view uniq as
   select unique key
     from small
    order by key;

 data _null_;
   file temp;
   set uniq end=end;

 if _n_ = 1 then do;
   put "create table result as"
     / "  select key,data"
     / "  from connection to dbms"
     / " (select key,data"
     / " from large where key in("
     / key;
  end;
 else if mod(_n_, &chunk) = 0
   and not end then do;
   put "));" //;

   put "insert into result"
    / " select key, data"
    / " from connection to dbms"
    / "(select key,data"
    / "from large where key in("
    / key;

    end;

 else if end then do;
    put key "));" //;
    end;
 else put key ",";
 run;

 proc sql;
   connect to <DBMS> as dbms;
   %inc temp;
ddss12
  • 91
  • 1
  • 10

1 Answers1

1

I think SAS will look for a file called temp in the current directory and try to write to that unless you've previously executed a filename statement telling it that 'temp' is actually a file somewhere else. I'm guessing that you've got read access but not write access in the current directory (i.e. in /x/sas/config/Lev1/SASApp).

Try running this before you run the datastep that writes your SQL and see if you still get the same error:

filename temp "%sysfunc(pathname(work))/temp.sas";

This will tell SAS to write to a file called temp.sas inside your work library - you should have write access there.

As for 'joining multiple columns' - what sort of join are you trying to do? Are all 4 of your variables A-D from the small dataset keys? Do you need to match on all of them? Have you made sure that indexes exist on the Teradata table for all of these variables?

Update:

It would be a lot simpler to do this as a single query with multiple join conditions - I presume this is an option that you've already discarded after your testing has established an unacceptably poor level of performance?

If I understand correctly, you want to join only when all 4 keys from the small table match in the big table. This should still be possible, but I'm not sure how well it will perform on the Teradata side.

Your current code is working through your small dataset 105 records at a time, constructing select and insert statements using where clauses along the lines of where key in (row1value row2value ... row105value). Using small sets of records like this makes it more likely that Teradata will use an index, speeding up the query. One approach you could take to obtain a 4-key join would be to construct clauses along the lines of

where (key1 = row1value and key2 = row1value and key3 = row1value and key4 = row1value) 
or (key1 = row2value and key2 = row2value and key3 = row2value and key4 = row2value)
or ...
or (key1 = row105value and key2 = row105value and key3 = row105value and key4 = row105value)

However, I don't know whether teradata would take advantage of the indexes on your large table when performing this sort of query, so I suggest that you proceed with caution and do some research on how teradata uses indexes. You might find the proc sql _tree and _method options useful in establishing whether indexes are being used.

It might make more sense to do an initial left join (for 100 rows at a time from your small dataset) on the key with the highest proportion of distinct values in your large dataset (which would make best use of the index), then use a where clause to find matches on the other variables. Or you could do a left join on all 4 conditions. With either of these approaches, you could use the firstobs and obs options to partition your small dataset into suitably small pieces, rather than writing out extensive where clauses.

user667489
  • 9,501
  • 2
  • 24
  • 35
  • thank you! Your filename statement works great. To your question, the joins are inner join. All 4 variables are from the Small table, but aslo exist on the TD large table and indexed too. I tried to create a column to join on by putting all 4 columns together, but it requires all 4 columns to be in character. when I join on put(A,char30)||B||C||D then the index became invalid. will creating volatile table Small in TD be a good solution? Is the memory not going to max out? Performance might be much better? can you provide some insight please? – ddss12 Oct 06 '14 at 20:13