1

I have a query

SELECT
    CntApp = COUNT(app.ApplicationID)
    ,r.RegionName
    ,d.DistrictName 
FROM dim.Application app
    JOIN dim.Geography g ON (app.ApplicationID  = g.GeographyID)   
       AND (app.CountryId = g.CountryId)
    JOIN dim.Region r   ON r.RegionID = g.RegionID  
    JOIN dim.District d ON d.DistrictId = g.DistrictID
    JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID         
GROUP BY
    r.RegionName
    ,d.DistrictName

and

SELECT
    CntCon = COUNT(c.ContractID)
    ,r.RegionName
    ,d.DistrictName
FROM dim.Contract c  
    JOIN dim.Geography g ON (c.ContractID = g.GeographyID)   
       AND (c.CountryId = g.CountryId)
    JOIN dim.Region r   ON r.RegionID = g.RegionID  
    JOIN dim.District d ON d.DistrictId = g.DistrictID
    JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID         
GROUP BY
    r.RegionName
    ,d.DistrictName

which I want to merge into one table, so the group by still works. The result I want to get:

CntApp | CntCon | RegionName | DistrictName
31       24       Pardubicky   Pardubice
21       16       Pardubicky   Chrudim
...

I've tried UNION ALL but got something like this instead:

CntApp | CntCon | RegionName | DistrictName
    NULL     24       Pardubicky   Pardubice
    21       NULL     Pardubicky   Pardubice
    26       NULL     Pardubicky   Chrudim
    ...
Cœur
  • 37,241
  • 25
  • 195
  • 267
  • So what you get out of it when you UNION it ALL? (+1 for nice drawing and sense of humour ;) –  Jul 03 '17 at 08:57
  • 1
    @Rafal: See edit. –  Jul 03 '17 at 09:01
  • You can optimise your queries by not doing UNION when the queries are virtually identical; you're effectively asking the db to join the same set of tables and count things twice. There are ways to avoid that... (i posted a unionless answer as an alternative) – Caius Jard Jul 03 '17 at 09:37

4 Answers4

2

You need to join 2 subqueries. This way you will get columns of both the queries side by side as you expect.

this should work :

SELECT iq1.CntApp , iq2.CntCon, iq1.iq1.RegionName,iq1.DistrictName 
FROM
(
SELECT
    CntApp = COUNT(app.ApplicationID)
    ,r.RegionName
    ,d.DistrictName 
FROM dim.Application app
    JOIN dim.Geography g ON (app.ApplicationID  = g.GeographyID)   
       AND (app.CountryId = g.CountryId)
    JOIN dim.Region r   ON r.RegionID = g.RegionID  
    JOIN dim.District d ON d.DistrictId = g.DistrictID
    JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID         
GROUP BY
    r.RegionName
    ,d.DistrictName
) iq1
inner join
(
SELECT
    CntCon = COUNT(c.ContractID)
    ,r.RegionName
    ,d.DistrictName
FROM dim.Contract c  
    JOIN dim.Geography g ON (c.ContractID = g.GeographyID)   
       AND (c.CountryId = g.CountryId)
    JOIN dim.Region r   ON r.RegionID = g.RegionID  
    JOIN dim.District d ON d.DistrictId = g.DistrictID
    JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID         
GROUP BY
    r.RegionName
    ,d.DistrictName
) iq2
on
iq1.RegionName = iq2.iq1.RegionName 
and 
iq1.DistrictName = iq2.DistrictName 
Prabhat G
  • 2,974
  • 1
  • 22
  • 31
  • Is INNER JOIN a wise choice here? There's a risk that one subquery will have no rows for a particular region/district pairing and thus rows will be lost from the output – Caius Jard Jul 03 '17 at 09:22
  • 1
    That's why I have highlighted `join` at the top :) Until we don't get proper data from OP, we can't comment on left/right or inner. I chose inner as sample as it won't return any NULL columns. – Prabhat G Jul 03 '17 at 09:26
1

UNION ALL will combine results column by column. You need to introduce fake columns and aggregate it again (or join like in the other solution):

SELECT SUM(CntApp) CntApp, SUM(CntCon) CntCon, RegionName, DistrictName FROM (

    SELECT
        CntApp = COUNT(app.ApplicationID)
        ,CntCon = 0
        ,r.RegionName
        ,d.DistrictName 
    FROM dim.Application app
        JOIN dim.Geography g ON (app.ApplicationID  = g.GeographyID)   
           AND (app.CountryId = g.CountryId)
        JOIN dim.Region r   ON r.RegionID = g.RegionID  
        JOIN dim.District d ON d.DistrictId = g.DistrictID
        JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID         
    GROUP BY
        r.RegionName
        ,d.DistrictName

    UNION ALL

    SELECT
        CntApp = 0
        ,CntCon = COUNT(c.ContractID)
        ,r.RegionName
        ,d.DistrictName
    FROM dim.Contract c  
        JOIN dim.Geography g ON (c.ContractID = g.GeographyID)   
           AND (c.CountryId = g.CountryId)
        JOIN dim.Region r   ON r.RegionID = g.RegionID  
        JOIN dim.District d ON d.DistrictId = g.DistrictID
        JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID         
    GROUP BY
        r.RegionName
        ,d.DistrictName
) d
GROUP BY RegionName, DistrictName
  • I've got the same idea, except I had used NULL instead of CntApp=0. Unfortunetely, it returns the same values both for CntApp and CntCont, which should not. –  Jul 03 '17 at 09:07
  • If this didn't answer your question, why did you mark it as the answer? – underscore_d Jul 03 '17 at 09:37
  • @StanislavJirák - you mean the query I provided returns the same values, or the one were you used NULLs? –  Jul 03 '17 at 09:39
1

You need a FULL JOIN

SELECT coalesce(app.RegionName, c.RegionName) AS RegionName,
       coalesce(app.DistrictName, c.DistrictName) AS DistrictName,
      coalesce(app.CntApp,0) AS CntApp,
      coalesce(c.CntCon,0) AS CntCon
FROM 
    (SELECT
       CntApp = COUNT(app.ApplicationID)
       ,r.RegionName
       ,d.DistrictName 
    FROM dim.Application app
       JOIN dim.Geography g ON (app.ApplicationID  = g.GeographyID)   
         AND (app.CountryId = g.CountryId)
       JOIN dim.Region r   ON r.RegionID = g.RegionID  
       JOIN dim.District d ON d.DistrictId = g.DistrictID
       JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID         
    GROUP BY
       r.RegionName
       ,d.DistrictName
    ) app
    FULL JOIN 
    (
       SELECT
       CntCon = COUNT(c.ContractID)
       ,r.RegionName
       ,d.DistrictName
    FROM dim.Contract c  
       JOIN dim.Geography g ON (c.ContractID = g.GeographyID)   
         AND (c.CountryId = g.CountryId)
       JOIN dim.Region r   ON r.RegionID = g.RegionID  
       JOIN dim.District d ON d.DistrictId = g.DistrictID
       JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID         
    GROUP BY
       r.RegionName
       ,d.DistrictName
    ) c ON app.RegionName = c.RegionName AND app.DistrictName = c.DistrictName
TriV
  • 5,118
  • 2
  • 10
  • 18
0

You can probably ditch the union, and it will be safer because your results wont be affected by stray cartesian joins that might occur if bad data works its way into the g/r/d/z tables:

SELECT
 CntApp,
 CntCon,
 r.RegionName,
 d.DistrictName 
FROM 
 dim.Geography g  
 INNER JOIN dim.Region r   ON r.RegionID = g.RegionID  
 INNER JOIN dim.District d ON d.DistrictId = g.DistrictID
 INNER JOIN dim.ZIPcode z  ON g.ZIPcodeID = z.ZIPcodeID 

 LEFT JOIN (SELECT ApplicationID, CountryID, COUNT(*) CntApp FROM dim.Application GROUP BY ApplicationID, CountryID) app
   ON (app.ApplicationID  = g.GeographyID) AND (app.CountryId = g.CountryId)

 LEFT JOIN (SELECT ContractID, CountryId, COUNT(*) as CntCon FROM dim.Contract GROUP BY ContractID, CountryId) c    
   ON (c.ContractID = g.GeographyID) AND (c.CountryId = g.CountryId)

Here's a bit of education point for you though:

If you have two blocks of data (from table, query, whatever) and you want to conenct them together vertically (more rows) then you use UNION If you want to conenct them together horizontally (more columns), you use JOIN

If we have:

a,b,c
a,b,c
a,b,c

And

a,y,z
a,y,z
a,y,z

This is what you get with UNION:

a,b,c
a,b,c
a,b,c
a,y,z
a,y,z
a,y,z

And this is what you get with JOIN:

a,b,c,y,z
a,b,c,y,z
a,b,c,y,z

Remember this, is will serve you well

Caius Jard
  • 72,509
  • 5
  • 49
  • 80