0

I've built a star schema with a number of dimensions, but I'd like to set a default value of 0 for the fact table if no matching value is found for that dimension. Here's my current solution:

Left join and case statements

SELECT 
    CASE WHEN d1.ID is NULL THEN 0 ELSE d1.ID END,
    CASE WHEN d2.ID is NULL THEN 0 ELSE d2.ID END
FROM OriginalTable OT
LEFT JOIN Dim1 d1
    ON OT.field1 = d1.field
LEFT JOIN Dim1 d2
    ON OT.field2 = d2.field

Is there a more efficient way?

user3457834
  • 314
  • 3
  • 12
  • 1
    Use SSIS, cache the dimension data and use a cached Lookup transformation to add the correct dimension table ID as the rows pass by. This way you avoid joins, lookups and are able to use [minimally logged insertions](https://learn.microsoft.com/en-us/sql/relational-databases/logs/the-transaction-log-sql-server?view=sql-server-2017#MinimallyLogged) into the fact table. Insertions can be batched which increases performance as well – Panagiotis Kanavos Nov 01 '18 at 12:40
  • Using joins like this to read data, lookup IDs and insert them into the target table means you need to lock both the source and target, perform a large join between a large table and the dimensions and finally insert the data into the fact table into a fully logged mode. It also means that the data may spill into tempdb as SQL Server tries to sort the data and calculate matches – Panagiotis Kanavos Nov 01 '18 at 12:41

1 Answers1

3

In SQL server you can use both IsNULL and coalesce, coalesce returns the first value that isn't null so

SELECT 
coalesce(d1.ID,0),
coalesce(d2.ID,0)
FROM OriginalTable OT
LEFT JOIN Dim1 d1
   ON OT.field1 = d1.field
LEFT JOIN Dim1 d2
   ON OT.field2 = d2.field 

should do the trick

Hedinn
  • 864
  • 4
  • 7
  • 1
    Note that the resulting expression from coalesce and isnull is different. the resultant of ISNULL is determined to be NOT NULL while the outcome of a COALESCE call is NULLable. [see here](https://dba.stackexchange.com/a/30013/95107). so while they could perform closely (as in performance measures) they aren't exactly the same. I'd expect the same execution plan from the OP `case` and your implementation of `coalesce` though i'd prefer `coalesce` here. – S3S Nov 01 '18 at 13:22
  • 1
    It's also worth noting that while this is more compact syntactically, under the hood the SQL engine expands that `COALESCE` out into a `CASE` that's fundamentally identical to the code the OP posted, so there's no performance difference between this and the code as written. Though this is, in fact, the way I'd tend to write it for readability's sake. – Eric Brandt Nov 01 '18 at 13:26