2

I have a table with data:

Customers

  • Sequence
  • ID
  • many other columns (not important)

Sample data:

Sequence ID
-----------
214906 2613
214906 2614
214906 2615
214907 2613
214907 2614
214907 2615
214908 2613
214908 2614
214908 2615
214000 2613
213004 4444
111111 5555
111111 5556
111112 5556 
111112 5555

How can I get the desired result below?

214906 2613
214907 2614
214908 2615
214000 2613
213004 4444
111111 5555
111112 5556

I tried various stuff with ROW_NUMBER() OVER(PARTITION BY Sequence) but it did not help because I need to take row 1 in first group, row 2 in second group etc. In other words, I need to somehow spread those Sequences across ID's. I cannot partition by ID's either because they might appear more than once in the table

John
  • 218
  • 4
  • 8
  • What is the logic, when group `7` has only `3` IDs for example? Thanks. – Zhorov May 28 '19 at 11:30
  • @Zhorov If a sequence has 3 IDs it means there will be two more sequences with the same 3 IDs. If a sequence has two IDs, there will be one more sequence with the same two IDs. So they are kinda grouped. – John May 28 '19 at 11:39
  • What is the logic behind `214000 2613`? What happens when a rollover takes place? This is not clear from your question. – Tim Biegeleisen May 28 '19 at 11:44
  • @TimBiegeleisen 214000 is a unique value and for this reason no grouping is required. And, for instance, 2149008 appears three times, every time with a unique ID and it means there are two more Sequences appearing three times each with the same set of ID's as for 2149008, in my example it's 214906 and 214907. Hopefully, it is more understandable now – John May 28 '19 at 11:50
  • 1
    No, it's not more understandable. There are tons of edge cases I see here which are not addressed. Maybe the best thing to do would be to fix your data model. – Tim Biegeleisen May 28 '19 at 11:55
  • Do you mean to pivot the data as per your question : >I tried various stuff with ROW_NUMBER() OVER(PARTITION BY Sequence) but it did not help because I need to take row 1 in first group, row 2 in second group etc. In other words, I need to somehow spread those Sequences across ID's. If not, then the answer Zhorov provided would suffice – Attie Wagner May 28 '19 at 12:47
  • @Zhorov You solution is pretty good, however [link](https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=6691ad6486bcc1a9ae13b18aaef8744a) in case if we add one more input record (112233, 4444) then it does not work properly. I would expect to see the output of 8 rows, containing both 213004 and 112233 – John May 28 '19 at 13:39
  • Ah, got it. Just did this: SELECT Sequence, ID FROM RankCTE WHERE RankNo = RowNo OR or SequenceCnt = 1. Thank you! – John May 28 '19 at 13:42
  • 1
    @Evgeny I'm glad you have found a solution. I've made an update, which does almost the same. You may check updated answer.Thanks. – Zhorov May 28 '19 at 14:23

1 Answers1

2

I hope I understand you correctly. I use the count of IDs per sequence as a group factor (using SUM() with OVER clause without ORDER BY) and after that appropriate ranking and row numbering:

Input:

CREATE TABLE #Data (
    Sequence int,
    ID int
)
INSERT INTO #Data 
    (Sequence, ID)
VALUES
    (214906, 2613),
    (214906, 2614),
    (214906, 2615),
    (214907, 2613),
    (214907, 2614),
    (214907, 2615),
    (214908, 2613),
    (214908, 2614),
    (214908, 2615),
    (214000, 2613),
    (213004, 4444),
    (111111, 5555),
    (111111, 5556),
    (111112, 5556), 
    (111112, 5555)

T-SQL:

;WITH SequenceCTE AS (
    SELECT 
        *,
        COUNT(*) OVER (PARTITION BY Sequence) AS SequenceCnt
    FROM #Data
), RankCTE AS (
    SELECT 
        *,
        DENSE_RANK() OVER (PARTITION BY SequenceCnt, Sequence ORDER BY SequenceCnt, ID) AS RankNo,
        ROW_NUMBER() OVER (PARTITION BY SequenceCnt, ID ORDER BY Sequence, ID) AS RowNo
    FROM SequenceCTE 
)
SELECT Sequence, ID
FROM RankCTE
WHERE RankNo = RowNo

Output:

----------------
Sequence    ID
----------------
214000      2613
213004      4444
111111      5555
111112      5556
214906      2613
214907      2614
214908      2615

Update (special case with one ID in a sequence):

;WITH SequenceCTE AS (
    SELECT 
        *,
        COUNT(*) OVER (PARTITION BY Sequence) AS SequenceCnt
    FROM #Data
), RankCTE AS (
    SELECT 
        *,
        CASE 
            WHEN SequenceCnt = 1 THEN 1
            ELSE DENSE_RANK() OVER (PARTITION BY SequenceCnt, Sequence ORDER BY SequenceCnt, ID) 
        END AS RankNo,
        CASE 
            WHEN SequenceCnt = 1 THEN 1
            ELSE ROW_NUMBER() OVER (PARTITION BY SequenceCnt, ID ORDER BY Sequence, ID) 
        END AS RowNo
    FROM SequenceCTE 
)
SELECT Sequence, ID
FROM RankCTE
WHERE RankNo = RowNo
Zhorov
  • 28,486
  • 6
  • 27
  • 52