-3

I have a large table A, with around 5M observations and want to join it with 3 other tables with around 400k observations each.

The tables will be joined using different keys to A. All tables have just 1 occurrence of the joining keys only, so they will be one to one joins

Is it more efficient to use just 1 proc sql block to make the 3 joins on A or write 3 separate proc sql blocks to make 1 join each?

Dirk Horsten
  • 3,753
  • 4
  • 20
  • 37
  • What are the sizes and number of observations of the tables? Are they indexed on the relevant (foreign) keys? are the tables sorted and how? Did you consider a sollution with hash tables in a datastep too? – Dirk Horsten Jan 12 '22 at 19:57
  • 1
    The main table has around 5M entries and the tables to be joined have around 400K rows each. The tables to be joined are sorted on the joining(foreign) key. – Vishrut Jaipuria Jan 14 '22 at 04:04

1 Answers1

0

The only richt answer to your question is: It does not matter.

In contrast to many other procedures and to the data step, proc sql does not wait for a run statement to execute: each sql statement is executed individually as if it were in a separate proc sql.

However, you can do a lot to improve your performance by creating indexes on the keys in the 3 smaller tables. (Google sas sql "create index") before joining them.

After that, it might be benificial to do all joins at once:

select <whatever you need>
from A
left join B on B.key1 = A.key1
left join C on C.key2 = A.key2
left join D on D.key3 = A.key3;

because then, you only read and write the big table once.

Dirk Horsten
  • 3,753
  • 4
  • 20
  • 37