3

What are alternatives to UNION ALL for joining many tables to see aggregated data from many source systems? I am looking for solution beyond T-SQL.

Suppose we have 3 source tables and we want to join them by id:

TableA

id  Adata
1   8383
2   2707
3   6181
4   6708

TableB

id  Bdata
1   2669
1   8582
2   6335
2   7355
3   7355
3   2277
4   2789
4   8566

TableC

id  Adata
1   2856
2   1364
3   4476
4   3311
4   8073

The desired result table I want to get:

enter image description here

The primitive way I do now to get it (by UNION ALL):

enter image description here

Reaching desired results by creation of huge table consisting of many tables under each other and many nulls seems to be very frustrating and primitive solution if we have many columns and many tables. I think this is common problem if we want to join different information systems like accounting, payroll, purchase etc.

The final aim is to give and Excel user aggregated data with a chance to drill down limited to selected table. For example by double clicking in desired results table in the first Bdata 11251 the user is intrested to see only the data that make that figure. But instead the user gets this:

enter image description here

Update. I want to preserve a chance for drill down in data. So I want to be able to see all details for each ID if I double click on results table in Excel.

Matt
  • 14,906
  • 27
  • 99
  • 149
Przemyslaw Remin
  • 6,276
  • 25
  • 113
  • 191
  • It seems strange to use Excel when you have a database system. Simply write a small GUI program in some programming language (C#, Java, whatever) for this purpose. – Thorsten Kettner Aug 24 '15 at 09:37
  • @prz I have got your desired output in my answer, however i would suggest you get a SUM row in the output (excel, SSRS etc) – Matt Aug 24 '15 at 09:38

5 Answers5

4

With your example, a simple join seems to be enough, why would you need to UNION ALL ?

SELECT A.id,
    SUM(A.Adata),
    SUM(B.Bdata),
    SUM(C.Cdata)
FROM TableA A
    JOIN TableB B ON A.id = B.id
    JOIN TableC C ON A.id = C.id
GROUP BY A.id

If you want to show the overall total, you can use WITH ROLLUP:

SELECT ISNULL(sub.id,'Total') AS "id",
    sub.Adata,
    sub.Bdata,
    sub.Cdata
FROM (
    SELECT A.id,
        SUM(A.Adata) AS Adata,
        SUM(B.Bdata) AS Bdata,
        SUM(C.Cdata) AS Cdata
    FROM TableA A
        JOIN TableB B ON A.id = B.id
        JOIN TableC C ON A.id = C.id
    GROUP BY A.id WITH ROLLUP) sub
A Hocevar
  • 726
  • 3
  • 17
  • I use UNION ALL to give an Excel user a chance for drill down in a pivot table. Your solution seems not to give a chance to make drill downs (show details in pivot table) for user to see data for single id. – Przemyslaw Remin Aug 24 '15 at 09:23
  • What kind of drill down would you allow your user to perform ? You mean not suming the data at all ? – A Hocevar Aug 24 '15 at 11:22
2

You are miximg concepts. UNION ALL is something rarely needed and used. You use it when glueing similar result sets together. This is not the case here.

In order to join tables, you should use joins of course. You get a result row per group you state with GROUP BY (the ID in your case). You use aggregation functions such as SUM, MAX, COUNT, etc. to aggregate data.

The query to write depends on whether all IDs are to be present in table A and the other tables. The difference is mainly the kind of join used then.

ID must be present in all tables:

select id, sum(a.adata), sum(b.bdata), sum(c.cdata)
from a
join b using (id)
join c using (id)
group by id;

ID must be pesent in table a only:

select id, sum(a.adata), coalesce(sum(b.bdata),0), coalesce(sum(c.cdata),0)
from a
left join b using (id)
left join c using (id)
group by id;

ID doesn't have to exist in any particular table:

select id, coalesce(sum(a.adata),0), coalesce(sum(b.bdata),0), coalesce(sum(c.cdata),0)
from a
full outer join b using (id)
full outer join c using (id)
group by id;

EDIT: I should add that SQL Server doesn't support the USING clause (which is standard SQL2003). You can replace it with an ON clause, which is easy as long as you don't need full outer joins, which are much more complicated without a using clause.

And: You get a sum row at the end by using group by rollup(id) instead of only group by id.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
1

You should use Inner Joins. Union (All) does not seem necessary unless some Ids are missing in some tables:

Select coalesce(cast(i.id as varchar(10)), 'SUM')
    , isnull(sum(A.data), '')
    , isnull(sum(B.data), '')
    , isnull(sum(C.data), '')
From (Select id From A
    Union Select id From B
    Union Select id From C
) as i
left Join A on A.id = i.id
left Join B on B.id = i.id
left Join C on C.id = i.id
group by grouping sets (i.id, ())

Grouping set adds the total row.

Julien Vavasseur
  • 3,854
  • 1
  • 20
  • 29
1

This can be a solution for what you are asking.

select id,Adata,Bdata,SUM(Cdata) as Cdata 
from ((select id,Adata,SUM(Bdata) as Bdata 
   from (A join B using(id)) group by id)T join C using(id)) 
group by id

It might not be an appropriate answer.

Sparrow7000
  • 79
  • 1
  • 11
1

To get your desired output (with the SUM row)

SELECT CAST(A.id AS VARCHAR(100)) AS ID,
(SELECT SUM(Aa.Adata) FROM TableA Aa WHERE A.id = Aa.id GROUP BY Aa.id) AS Adata,
(SELECT SUM(Ba.Bdata) FROM TableB Ba WHERE B.id = Ba.id GROUP BY Ba.id) AS Bdata,
(SELECT SUM(Ca.Cdata) FROM TableC Ca WHERE C.id = Ca.id GROUP BY Ca.id) AS Cdata
FROM TableA A
INNER JOIN TableB B ON A.id = B.id
INNER JOIN TableC C ON A.id = C.id
GROUP BY A.id, B.id, C.id
UNION ALL
SELECT CAST('SUM' AS VARCHAR(100)),
SUM(A.Adata), (SELECT 
SUM(B.Bdata)
FROM TableB B), (SELECT 
SUM(C.Cdata)
FROM TableC C)
FROM TableA A

Output:

ID  Adata   Bdata   Cdata
1   8383    11251   2856
2   2707    6335    1364
3   6181    9632    4476
4   6708    11355   11384
SUM 23979   38573   20080

SQL Fiddle: http://sqlfiddle.com/#!3/ba58d/11/0

Matt
  • 14,906
  • 27
  • 99
  • 149