Merge multiple rows with same ID into one row

Question

How can I merge multiple rows with same ID into one row.

When value in first and second row in the same column is the same or when there is value in first row and NULL in second row. I don't want to merge when value in first and second row in the same column is different.

I have table:

ID |A    |B    |C
1   NULL  31    NULL
1   412   NULL  1
2   567   38    4
2   567   NULL  NULL
3   2     NULL  NULL
3   5     NULL  NULL
4   6     1     NULL
4   8     NULL  5
4   NULL  NULL  5

I want to get table:

ID |A    |B    |C
1   412   31    1
2   567   38    4
3   2     NULL  NULL
3   5     NULL  NULL
4   6     1     NULL
4   8     NULL  5
4   NULL  NULL  5

what have you tried? What happens if you have more than 2 rows each with different data in the two rows? How do you choose which row to take data from over another? — Kritner, Feb 17 '15 at 19:34
Why in your desired result do you keep the last row (`4 NULL NULL 5`)? Shouldn't it be merged with the previous row (`4 8 NULL 5`)? — Vladimir Baranov, Feb 17 '15 at 22:07
@VladimirBaranov No, it shouldn't be merged, because of row (`4 6 1 NULL`) having a conflict in the second value `6` with the `8` in the next row. By the explanation given, then no rows for that ID should be merged. — ErikE, Feb 18 '15 at 17:38

score 7 · Accepted Answer · answered Feb 18 '15 at 01:54

7

I think there's a simpler solution to the above answers (which is also correct). It basically gets the merged values that can be merged within a CTE, then merges that with the data not able to be merged.

WITH CTE AS (
    SELECT
        ID,
        MAX(A) AS A,
        MAX(B) AS B,
        MAX(C) AS C
    FROM dbo.Records
    GROUP BY ID
    HAVING MAX(A) = MIN(A)
        AND MAX(B) = MIN(B)
        AND MAX(C) = MIN(C)
)
    SELECT *
    FROM CTE
    UNION ALL
    SELECT *
    FROM dbo.Records
    WHERE ID NOT IN (SELECT ID FROM CTE)

SQL Fiddle: http://www.sqlfiddle.com/#!6/29407/1/0

answered Feb 18 '15 at 01:54

Jason W

13,026
3
31
62

Good work seeing a simpler way. For what it's worth, `NOT IN` gives me the willies. A `NOT EXISTS` seems better. :) – ErikE Feb 18 '15 at 17:47
As long as the `NOT IN` uses a field that is not null, the performance will be identical. In this case doing so on a primary key, we are safe. Both produce a LEFT ANTI SEMI JOIN by the optimizer. I would definitely avoid it if I know the field to have nulls, and in practice, on a field not indexed or varchars. – Jason W Feb 18 '15 at 19:55
I have just seen over and over where use of `NOT IN` leads developers to a different kind of thinking than `JOIN` or `EXISTS` do, and in the long run that type of thinking isn't as facile and doesn't produce as well-structured queries (and can ultimately have performance problems if the engine doesn't convert it to a join). – ErikE Feb 18 '15 at 20:16

ErikE · Answer 2 · 2015-02-18T23:51:12.550

WITH Collapsed AS (
   SELECT
      ID,
      A = Min(A),
      B = Min(B),
      C = Min(C)
   FROM
      dbo.MyTable
   GROUP BY
      ID
   HAVING
      EXISTS (
         SELECT Min(A), Min(B), Min(C)
         INTERSECT
         SELECT Max(A), Max(B), Max(C)
      )
)
SELECT
   *
FROM
   Collapsed
UNION ALL
SELECT
   *
FROM
   dbo.MyTable T
WHERE
   NOT EXISTS (
      SELECT *
      FROM Collapsed C
      WHERE T.ID = C.ID
);

See this working in a SQL Fiddle

This works by creating all the mergeable rows through the use of Min and Max--which should be the same for each column within an ID and which usefully exclude NULLs--then appending to this list all the rows from the table that couldn't be merged. The special trick with EXISTS ... INTERSECT allows for the case when a column has all NULL values for an ID (and thus the Min and Max are NULL and can't equal each other). That is, it functions like Min(A) = Max(A) AND Min(B) = Max(B) AND Min(C) = Max(C) but allows for NULLs to compare as equal.

Here's a slightly different (earlier) solution I gave that may offer different performance characteristics, and being more complicated, I like less, but being a single flowing query (without a UNION) I kind of like more, too.

WITH Collapsible AS (
   SELECT
      ID
   FROM
      dbo.MyTable
   GROUP BY
      ID
   HAVING
      EXISTS (
         SELECT Min(A), Min(B), Min(C)
         INTERSECT
         SELECT Max(A), Max(B), Max(C)
      )
), Calc AS (
   SELECT
      T.*,
      Grp = Coalesce(C.ID, Row_Number() OVER (PARTITION BY T.ID ORDER BY (SELECT 1)))
   FROM
      dbo.MyTable T
      LEFT JOIN Collapsible C
         ON T.ID = C.ID
)
SELECT
   ID,
   A = Min(A),
   B = Min(B),
   C = Min(C)
FROM
   Calc
GROUP BY
   ID,
   Grp
;

This is also in the above SQL Fiddle.

This uses similar logic as the first query to calculate whether a group should be merged, then uses this to create a grouping key that is either the same for all rows within an ID or is different for all rows within an ID. With a final Min (Max would have worked just as well) the rows that should be merged are merged because they share a grouping key, and the rows that shouldn't be merged are not because they have distinct grouping keys over the ID.

Depending on your data set, indexes, table size, and other performance factors, either of these queries may perform better, though the second query has some work to do to catch up, with two sorts instead of one.

Haven't looked through all of this but as for `HAVING Min(A) = Max(A) AND Min(B) = Max(B) AND Min(C) = Max(C)`, I would probably replace it with `HAVING EXISTS (SELECT Min(A), Min(B), Min(C) INTERSECT SELECT Max(A), Max(B), Max(C))` because of cases like `1, 2, NULL; 1, NULL, NULL` (i.e. where at least one column is NULL across all collapsable rows). — Andriy M, Feb 18 '15 at 19:08
You're right about an all-`NULL` column for one `ID`. Is my update better now (see the SQL fiddle)? I also considered `NOT EXISTS (SELECT Min(A), Min(B), Min(C) EXCEPT SELECT Max(A), Max(B), Max(C))`--do you have comments on that for me (curious if I'm overlooking something where it would not work the same). — ErikE, Feb 18 '15 at 19:26
Everything seems fine now. I was thinking of ways to avoid potential repetition of evaluation of `Collapsible` and was trying to replace the non-collapsible part of your query, but in the end my take on it would require UNION instead of UNION ALL. Oh well, I guess this is as good as can be. :) Good job anyway. — Andriy M, Feb 18 '15 at 21:56
dba.se people sometimes scoff at the SQL questions here on SO, but some of them really offer fun puzzles, like this one. :) — ErikE, Feb 18 '15 at 23:45

peter.petrov · Answer 3 · 2015-02-17T19:38:33.513

0

You can try something like this:

select 
isnull(t1.A, t2.A) as A,
isnull(t1.B, t2.B) as B,
isnull(t1.C, t2.C) as C
from
table_name t1
join table_name t2 on t1.ID = t2.ID and .....

You mention the concepts of first and second. How do
you define this order? Place that order defining condition
in here: .....

Also, I assume you have exactly 2 rows for each ID value.

edited Feb 17 '15 at 19:38

answered Feb 17 '15 at 19:33

peter.petrov

38,363
16
94
159

1

COALESCE would be better than ISNULL here – Codeman Feb 17 '15 at 19:36
@Pheonixblade9 That could be, yes. I doubt that's the main thing/point in this question though. – peter.petrov Feb 17 '15 at 19:37
Yupper, that's why I made a comment rather than adding my own answer - you got it right, I just think COALESCE is better. – Codeman Feb 17 '15 at 19:41
Please re-read the question. The order of the rows isn't significant as there is no merging/collapsing if *any* pair of rows bears conflicting values. I think a good answer to the question will offer a query that does handle more than two rows and runs correctly against the sample data, rather than using the vague "and some join conditions here that make it all work correctly". That reminds me more of [then a M. O.](http://star.psy.ohio-state.edu/coglab/Miracle.html). – ErikE Feb 17 '15 at 20:40

Merge multiple rows with same ID into one row

3 Answers3

See this working in a SQL Fiddle

Linked