3

I have two tables with some unique and some duplicate entries. I need to UNION those tables and output an additional column, that concatenates a tableidentifier the row was found in. Please find an example here: http://sqlfiddle.com/#!18/dc20c/6/0

My Output should look like this:

tablename || Column1 || Column2
AB           Foo        Bar
A            Donut      Onions
B            Donut      Pickles

Sadly just adding a column to union will make the rows unique and thus not be merged or concatenated. How do I achieve this?

Ksdmg
  • 397
  • 5
  • 15
  • 1
    Just for the sake of mentioning it (tagged sql server 2012, so this isn't an option): in sql server 2017 `STRING_AGG` could be used: http://sqlfiddle.com/#!18/dc20c/36 – Me.Name Feb 01 '18 at 13:50

3 Answers3

3

You could aggregate your results and calculate the value for tablename...

http://sqlfiddle.com/#!18/dc20c/32

SELECT
    MAX(CASE WHEN tablename = 'A' THEN 'A' ELSE '' END)
  + MAX(CASE WHEN tablename = 'B' THEN 'B' ELSE '' END)
  + MAX(CASE WHEN tablename = 'Z' THEN 'Z' ELSE '' END)  AS tablename,
  column1,
  column2
FROM
(
  SELECT
    'A' AS tablename,
    Column1,
    Column2
  FROM
    TableFoo

UNION ALL

  SELECT
    'B' AS tablename,
    Column1,
    Column2
  FROM
    TableBar
)
  TableFooBar
GROUP BY
  column1,
  column2
;

Note that I also use UNION ALL instead of UNION.

UNION has an additional overhead for removing duplicates. This could both impact your results in a way you don't want, and will certainly introduce addition processing costs.

UNION ALL avoids both of those.

MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • Can you tell me if your approach is more performant than @Jayasurya Satheesh answer? I need to aggregate 3 tables with about 5000 rows each, yours is looking to me like it needs more time to complete? – Ksdmg Feb 01 '18 at 14:01
  • 1
    @ksdmg - As with everything, it depends. As the other answer uses a correlated sub-query to generate the `tablename` column, I'd expect this slightly more repetitive code to cost less to execute. You should test that on your actual data to be sure. Over 5000rows though, I doubt you'll see Much difference (it's not that much data). – MatBailie Feb 01 '18 at 14:30
  • 1
    One uses DISTINCT and one uses GROUP BY, with pretty much the same cost. The main difference is the correlated sub-query *(relatively expensive)* vs my use of CASE and aggregates *(cheaper, but more typing)*. – MatBailie Feb 01 '18 at 14:35
  • Thanks for your detailed and fast answer! – Ksdmg Feb 01 '18 at 15:04
1

try This

;WITH Q
AS
(
  SELECT
  'A' AS tablename,
  Column1,
  Column2
FROM
  TableFoo

  UNION

  SELECT
  'B' AS tablename,
  Column1,
  Column2
FROM
  TableBar

)
SELECT DISTINCT
  TableName = replace(STUFF((SELECT ','+TableName  AS [data()]
                     FROM Q T2 
                     WHERE T2.Column1 = T1.Column1
                        AND T2.Column2 = T1.Column2
                     FOR XML PATH('')),1,1,''),',','') ,
  Column1,
  Column2
  FROM Q T1

Check the SQL Fiddle Demo

Jayasurya Satheesh
  • 7,826
  • 3
  • 22
  • 39
0

You can accomplish this with just a union with no nesting.

SELECT
  CASE WHEN (A.COLUMN1 IS NOT NULL AND A.COLUMN2 IS NOT NULL) THEN 'AB' ELSE 'A' END AS tablename,
  B.Column1,
  B.Column2
FROM
  TableFoo B
   LEFT OUTER JOIN TABLEBAR A ON A.COLUMN1=B.COLUMN1 AND A.COLUMN2=B.COLUMN2
  UNION
SELECT
  'B' tablename,
  B.Column1,
  B.Column2
FROM
  TableBar B
      LEFT OUTER JOIN TABLEFOO A ON A.COLUMN1=B.COLUMN1 AND A.COLUMN2=B.COLUMN2
WHERE (A.COLUMN1 IS NULL AND A.COLUMN2 IS NULL)
Ctznkane525
  • 7,297
  • 3
  • 16
  • 40
  • @MatBailie - updated...includes no nesting or aggregation – Ctznkane525 Feb 01 '18 at 13:48
  • Indeed, no aggregation, but does have two outer joins, which would be more expensive than any aggregation. – MatBailie Feb 01 '18 at 13:48
  • nested tables are more expensive than joins on properly indexed tables – Ctznkane525 Feb 01 '18 at 13:50
  • "Nested Tables" are not inherently expensive. SQL is a declarative language, and not an imperative language. "Nested Tables", "Derived Tables", "Sub-Queries", "Inline-Views" are expanded macro-like when an execution plan is being built. They only introduce additional costs when "mis-used", such as filtering or joining on a calculated field that can not be back-tracked to an indexed field. This is the same even without nesting. What you have said is a mis-understanding of how RDBMS actually work. – MatBailie Feb 01 '18 at 13:53
  • An optimization of your approach would simply to use a `FULL OUTER JOIN` instead of two `LEFT OUTER JOIN`s unioned together. Additionally, your second query in the `UNION` *(which should be `UNION ALL`)*, should be a `RIGHT JOIN` or have the tables in the opposite order. Using `FULL OUTER JOIN` ***and*** having indexes on `column1, column2` on both tables ***could*** then outperform the use of any `UNION ALL` approach, but is harder to generalize to more than 2 source tables. – MatBailie Feb 01 '18 at 13:56