0

I am doing analysis on the Stack Overflow dump.

Problem statement: I have 4 tables and require result in the format given.

Table 1: UserID Year QuestionsOnTopicA
Table 2: UserID Year AnswersOnTopicA
Table 3: UserID Year QuestionsOnTopicB
Table 4: UserID Year AnswersOnTopicB

Desired Output:

UserID Year QuestionsOnTopicA AnswersOnTopicA QuestionsOnTopicB AnswersOnTopicB

UserID column should have entries from all the 4 tables.

I tried performing inner and outer join on the tables but the results were incorrect. Inner join (returns userid present only in first table 1) Outer join (returns other columns only for userid in table 1)

Not sure if union will make sense in this scenario.

Queries are being executed on data.stackexchange.com/stackoverflow

Example

Table 1: 1001, 2010, 5 || 1001, 2011, 3 || 1002, 2010, 4

Table 2: 1001, 2010, 10 || 1001, 2011, 7 || 1002, 2010, 5

Table 3: 1002, 2010, 5

Table 4: 1001, 2010, 10 || 1004, 2011, 5

Output:

1001, 2010, 5 , 10, 0, 10

1001, 2011, 3, 7, 0, 0

1002, 2010, 4, 5, 5, 0

1004, 2011, 0, 0, 0, 5

Laurel
  • 5,965
  • 14
  • 31
  • 57
Mohit Jain
  • 135
  • 1
  • 4
  • 12
  • My first reaction was to question why you have two sets of tables (Q/A) when all that is different between them is the topic. Do you really want to create a new set of tables for every topic? Fix your design and your problem just goes away. – TommCatt Nov 12 '14 at 19:57

4 Answers4

1

Use this SQL may be?

SELECT a.UserID, a.Year,
       a.QuestionsOnTopicA,
       b.AnswersOnTopicA,
       c.QuestionsOnTopicB,
       d.AnswersOnTopicB
FROM Table 1 a,
     Table 2 b,
     Table 3 c,
     Table 4 d
WHERE a.UserID = b.UserID
  AND b.UserID = c.UserID
  AND c.UserID = d.UserID
  AND d.UserID = a.UserID
Praveen Kumar Purushothaman
  • 164,888
  • 24
  • 203
  • 252
1
select coalesce(a.UserID, b.UserID, c.UserID, d.UserID),
       coalesce(a.Year, b.Year, c.Year, d.Year),
       a.QuestionsOnTopicA, b.AnswersOnTopicA,
       c.QuestionsOnTopicB, d.AnswersOnTopicB
from Table1 a full outer join Table2 b on a.UserID = b.UserID and a.Year = b.Year
              full outer join Table3 c on (c.UserID = b.UserID or c.UserID = a.UserID) 
                                          and (c.Year = b.Year or c.Year = a.Year)
              full outer join Table4 d on (d.UserID = c.UserID or d.UserID = b.UserID or d.UserID = a.UserID) 
                                          and (d.Year = a.Year or d.Year = b.Year or d.Year = a.Year);
Multisync
  • 8,657
  • 1
  • 16
  • 20
  • This looks promising however data.stackexchange is giving timeout expired error -"The timeout period elapsed prior to completion of the operation or the server is not responding." I will try again in some time. – Mohit Jain Nov 12 '14 at 16:34
1

Ok, this works as intended:

SELECT  COALESCE(A.UserID,B.UserID,C.UserID,D.UserID) UserID,
        COALESCE(A.[Year],B.[Year],C.[Year],D.[Year]) [Year],
        ISNULL(A.QuestionsOnTopicA,0) QuestionsOnTopicA,
        ISNULL(B.AnswersOnTopicA,0) AnswersOnTopicA,
        ISNULL(C.QuestionsOnTopicB,0) QuestionsOnTopicB,
        ISNULL(D.AnswersOnTopicB,0) AnswersOnTopicB
FROM Table1 A
FULL JOIN Table2 B
    ON A.UserID = B.UserID
    AND A.[Year] = B.[Year]
FULL JOIN Table3 C
    ON COALESCE(A.UserID,B.UserID) = C.UserID
    AND COALESCE(A.[Year],B.[Year]) = C.[Year]
FULL JOIN Table4 D
    ON COALESCE(A.UserID,B.UserID,C.UserID) = D.UserID
    AND COALESCE(A.[Year],B.[Year],C.[Year]) = D.[Year]

Here is a sqlfiddle with a demo of this.

And the results are:

╔════════╦══════╦═══════════════════╦═════════════════╦═══════════════════╦═════════════════╗
║ UserID ║ Year ║ QuestionsOnTopicA ║ AnswersOnTopicA ║ QuestionsOnTopicB ║ AnswersOnTopicB ║
╠════════╬══════╬═══════════════════╬═════════════════╬═══════════════════╬═════════════════╣
║   1001 ║ 2010 ║                 5 ║              10 ║                 0 ║              10 ║
║   1001 ║ 2011 ║                 3 ║               7 ║                 0 ║               0 ║
║   1002 ║ 2010 ║                 4 ║               5 ║                 5 ║               0 ║
║   1004 ║ 2011 ║                 0 ║               0 ║                 0 ║               5 ║
╚════════╩══════╩═══════════════════╩═════════════════╩═══════════════════╩═════════════════╝
Lamak
  • 69,480
  • 12
  • 108
  • 116
0

First of all you should retrieve the data from the tables using inner join. Then you should use SQL Server Pivot as shown in this link.

Ghyath Serhal
  • 7,466
  • 6
  • 44
  • 60