2

I have always used multiple joins when I need to reference same data on different columns. Like this:

select
    a.ContactName
    ,b.ContactName
    ,c.ContactName
from
    OrderBase as o
    left join ContactBase as a on o.contactid1 = a.ContactId
    left join ContactBase as b on o.contactid2 = b.ContactId
    left join ContactBase as c on o.contactid3 = c.ContactId

I have always thought this was very inefficient. Loading the table 3 times for 3 lookups. Are there a more efficient way to do this? I don't mean the writing of code, but the use of resources. Are there any good tools implemented in the later versions of SQL server that addresses this?

Sweetspot
  • 91
  • 1
  • 9
  • 1
    If you use the right indexes this should be the best approach... – Shnugo May 27 '16 at 10:00
  • You need 3 columns for contact 1,2 and 3. so there is no other way to get that result. how big is your database? if it is not big enough, why you think this is not efficient. if you have performance issue, you can use schema_binding view for this particular query. it uses disk storage though. BTW, I think your ContacBase table should not be big. adding index improves performance if your ContactBase table is big, otherwise it won't improve a lot. – FLICKER May 27 '16 at 10:06
  • OrderBase can add contactName1, contactName2, contactName3 and store where OrderBase updated. Then, no join is needed... – daniel May 27 '16 at 10:08
  • 1
    @daniel, are you saying he should store redundant data?!! – FLICKER May 27 '16 at 10:09
  • Repeating group: https://en.wikipedia.org/wiki/First_normal_form – joop May 27 '16 at 10:10
  • yes..., it is not suggested but if the join performance is poor even right indexes are added... – daniel May 27 '16 at 10:12
  • @joop, what do you mean? each order can have 3 contacts and I think the design is normalized. – FLICKER May 27 '16 at 10:12
  • 1
    @daniel, having redundant data in OLTP database is not a good idea at all. there are many other ways to improve performance. – FLICKER May 27 '16 at 10:13
  • @FLICKER : the repeating group model implies 3 FK constraints, and causes 3 left joins to be needed in this query. [data modelling is all about contraint minimization] – joop May 27 '16 at 10:26
  • does the wiki link provides a solution for this? and what if an order can have 2 contacts nor 3? it seems contacts are optional for an Order – FLICKER May 27 '16 at 10:28
  • I'm after the optimal way to query this kind of tables. 3 lookups isn't a big deal. Let's use same example, but with 150 columns and 30 million rows in Order table, 500 K in contact table. – Sweetspot May 27 '16 at 10:54

1 Answers1

1

The first comment is that storing the same value is multiple columns is often a suspicious pattern. Often, such a data structure is better served using a junction table with one row per "OrderBase" and one per "ContactId". A junction table is not always the best solution, but it often is.

As for your query:

select a.ContactName, b.ContactName, c.ContactName
from OrderBase as o left join
     ContactBase a
     on o.contactid1 = a.ContactId left join
     ContactBase b
     on o.contactid2 = b.ContactId left join
     ContactBase c
     on o.contactid3 = c.ContactId;

This specifically wants an index on ContractBase(ContactId, ContactName). The use of left join for this purpose is entirely reasonable.

With this index, the original data pages are not even needed by the query, because the index "covers" the query -- all columns in the query are in the index. You don't need to worry about loading the index multiple times. Although the index (or table) is logically loaded three times, something called the page cache will probably keep the pages in memory, so the subsequent loads will be very fast.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I'm self-taught, so I'm no expert in the inner workings of the database engine (yet). It seems there is no "better" way to do this (yet). But since I am always learning, can you elaborate on this? How will the logically loaded index handle immense volumes of data? Can it cache unlimited amounts of data? If I join 150 columns and have millions of rows, is this still the best way to do it (not thinking on the writing nightmare. Would probably use xml or dynamic queries to ease the writing). – Sweetspot May 27 '16 at 11:45
  • @Sweetspot . . . You shouldn't worry about performance problems until they pose a problem. Doing three `left join`s like this should not be an issue, assuming the indexes are set up correctly. – Gordon Linoff May 28 '16 at 21:42