2

I have 3 tables that I want to merge, each with a different column of interest. I also have an id variable that I want to do separate merges "within" id. The idea is that I want to merge X, Y, and Z by date (within ID), and have missing values if that date does not exist for a particular variable.

Table X:
ID     Date         X
1      2012-01-01   101
1      2012-01-02   102
1      2012-01-03   103
1      2012-01-04   104
1      2012-01-05   105
2      2012-01-01   150

Table Y:
ID     Date         Y
1      2012-01-01   301
1      2012-01-02   302
1      2012-01-03   303
1      2012-01-11   311
2      2012-01-01   350

Table Z:
ID     Date         Z
1      2012-01-01   401
1      2012-01-03   403
1      2012-01-04   404
1      2012-01-11   411
1      2012-01-21   421
2      2012-01-01   450

Desired Result Table:
ID     Date         X     Y     Z
1      2012-01-01   101   301   401
1      2012-01-02   102   302   .
1      2012-01-03   103   303   403
1      2012-01-04   104   .     404
1      2012-01-05   105   .     .
1      2012-01-11   .     311   411
1      2012-01-21   .     .     421
2      2012-01-01   150   350   450

Any ideas how to write this SQL statement? I've tried messing around with "full joins" and where statements for cross products, but I keep getting duplicate values for some of my ID-date combinations, or sometimes no ID.

Any help would be appreciated.

doxguy
  • 185
  • 1
  • 3
  • 11

1 Answers1

3

Joins can be tricky things. My usual approach is to form the set of Keys first, and then use those keys to get what I want.

SELECT source.ID, source.Date, x.X, y.Y, z.Z
FROM
(
  SELECT ID, Date
  FROM TableX
  UNION
  SELECT ID, Date
  FROM TableY
  UNION
  SELECT ID, Date
  FROM TableZ
) as source
LEFT JOIN TableX x ON source.ID = x.ID AND source.Date = x.Date
LEFT JOIN TableY y ON source.ID = y.ID AND source.Date = y.Date
LEFT JOIN TableZ z ON source.ID = z.ID AND source.Date = z.Date
ORDER BY source.ID, source.Date
Amy B
  • 108,202
  • 21
  • 135
  • 185
  • I can think of another approach that involves a PIVOT that may use less IO. – Amy B Feb 11 '12 at 02:19
  • This becomes a problem when you're actually merging a subset of Table X, Y, and Z, and that subset is expensive to compute, because you end up having to compute the subsets twice for each table, since you are referencing each table from two different 'from' clauses. Still, this is probably the cleanest, fastest approach. I actually just asked a similar question, because I'm looking for a different answer to avoid generating Table X, Y, and Z multiple times. – Triynko Mar 06 '13 at 01:43