I'm trying to improve the loading of a dimensional data warehouse. I need to load tens of thousands of records each time I run the loading operation; sometimes I need to load hundreds of thousands of records; semi-annually I need to load millions of records.
For the loading operation, I build an in-memory record corresponding to a 48-column Fact table record which is to be updated or inserted (depending on whether a previous version of the record is already in the Fact table). The in-memory record includes about 2 dozen foreign key pointers to various dimension tables. I have indexes on all of those Fact table foreign keys (and of course I can disable those indexes while loading the data warehouse).
After populating the in-memory record with fresh data for the Fact table record, I add it to the Fact table (update or insert, as above).
I am wondering if I could improve performance in update situations by updating only those columns which have changed, instead of mindlessly updating every column in the Fact table record. Doing this would add some overhead in that my loading program would become more complex. But would I gain any benefit anyway? Suppose for example that one of those dimension table foreign keys hasn't changed. Is there going to be a performance benefit if I do not update that particular column? I guess it comes down to whether SQL Server internally tracks (counts??) foreign key references between the Fact and dimension tables. If SQL Server does do such tracking, then perhaps there would be a performance benefit if I do not update the column, since then SQL Server would not need to perform its internal tracking operation. But if SQL Server does not do such tracking then I suppose that I should just update all 48 Fact table columns regardless of whether they have changed or not, since that would avoid adding complexity & overhead to my loading program.
I'd appreciate any comments or suggestions.