1

I'm trying to write some code that merges on a new column of data onto the master dataset.

As this is being written as a macro, I would prefer that no new dataset is created in the process.

I have achieved this using the following code:

    PROC SQL;
        ALTER TABLE table1
         ADD ADD_TYP Character length=12 format=$12.;
    QUIT;

    PROC SQL;
        UPDATE table1 as A
            SET ADD_TYP = (SELECT B.ADD_TYP FROM Address_Index as B 
                            WHERE A.Key=B.Key);
    QUIT;

However, the processing time is horrendous. Is this normal? The left join is done in 5 seconds.

I am considering just doing "CREATE TABLE table1" i.e. Overwriting but with error messages in SAS

Joe
  • 62,789
  • 6
  • 49
  • 67
Wolfspirit
  • 155
  • 3
  • 14
  • 1
    I think in SAS SQL Alter Table will still create a new table. Ideally, if this is run repeatedly you may want to find a different way of storing your data, long over wide. – Reeza Feb 22 '16 at 02:00
  • Based on my runs, its still the one table. The alter table command adds a unique column of data so its not really making it "longer" @Reeza – Wolfspirit Feb 22 '16 at 02:36
  • Are you running this repeatedly or once? If repeatedly, then I have to assume there's some consistent logic to the data being added. It is one table - I'm pretty sure SAS drops old table and recreates new table with old values and new column. In long run this can get resource intensive. Again, this only matters if you are running this more than once. – Reeza Feb 22 '16 at 02:42
  • Just to be used once per table. @Reeza – Wolfspirit Feb 22 '16 at 03:09
  • SAS has to be dropping/recreating the table - unless the additional space is inside the block size I guess, but that seems unlikely as most of the time that won't be true. Otherwise where do those 12 bytes go? – Joe Feb 22 '16 at 16:00

1 Answers1

1

5 seconds doesn't seem outrageous.

But for the update query, you want an index on Address_Index(Key, ADD_TYP). That should give you the best performance.

I should note, however, that it is probably better to use a JOIN to get the information. You can create a view that contains the two tables with the JOIN, so the work is only done when you need to fetch the result.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 5 seconds was using the left join, not the alter table code. It takes 3 hours via Alter table! Thanks for the index suggestion I'll look into it. @Gordon – Wolfspirit Feb 22 '16 at 02:33
  • Hey @Gordon - The index takes about a minute to build but the Update is then done in 5 seconds! I'd consider this a vast improvement :) – Wolfspirit Feb 22 '16 at 03:10