1

What I have going on here is:

Table 1: AOC_Model

AOC_ID int (Primary Key)
Model varchar(50)

Table 2: AOC_Chipset

AOC_CHIPSET_ID int (Primary Key)
CONTROLLER_ID int
CHIPSET_ID int
AOC_ID int

Table 3: Controller

CONTROLLER_ID int (Primary Key)
CONTROLLER varchar(10)

Table 4: Chipset

CONTROLLER_ID int (Primary Key)
CHIPSET_ID int (Primary Key)
CHIPSET varchar(50)

Table 5: Notes_Chipset

NOTES_CHIPSET_ID int (Primary Key)
CONTROLLER_ID int
CHIPSET_ID int
DATE date
NOTES varchar(800)

First I have a Chipset table that is joined to Controller via Controller_ID Then I have AOC_Chipset which actually acts as Joining table between Controller and Chipset. AOC_Chipset is joined to Chipset via Controller_ID and Chipset_id Then I have Notes_Chipset which is also joined to Chipset via Controller_ID and Chipset_id And Finally, I have AOC_Model which is joined to AOC_Chipset via AOC_ID

I have many to many relationships going on here. I could have one or two controllers assigned to the same AOC_ID. I could have one or two chipsets assigned to the same controller. I could have multiple Notes assigned to the same Chipset.

I created this query in SQL Server 2019:

SELECT
    dbo.AOC_CHIPSET.AOC_ID, 
    string_agg(dbo.CONTROLLER.CONTROLLER, ', ') AS vControllers, 
    string_agg(dbo.CHIPSET.CHIPSET, ', ') AS vChipsets, 
    string_agg(dbo.NOTES_CHIPSET.DATE, ', ') AS vDate,
    string_agg(dbo.NOTES_CHIPSET.NOTES, ', ') AS vNotes
FROM
    dbo.AOC_CHIPSET 
INNER JOIN 
    dbo.CHIPSET ON dbo.AOC_CHIPSET.CONTROLLER_ID = dbo.CHIPSET.CONTROLLER_ID 
                AND dbo.AOC_CHIPSET.CHIPSET_ID = dbo.CHIPSET.CHIPSET_ID 
INNER JOIN 
    dbo.CONTROLLER ON dbo.CHIPSET.CONTROLLER_ID = dbo.CONTROLLER.CONTROLLER_ID 
INNER JOIN 
    dbo.NOTES_CHIPSET ON dbo.CHIPSET.CONTROLLER_ID = dbo.NOTES_CHIPSET.CONTROLLER_ID  
                      AND dbo.CHIPSET.CHIPSET_ID = dbo.NOTES_CHIPSET.CHIPSET_ID 
                      AND dbo.CONTROLLER.CONTROLLER_ID = dbo.NOTES_CHIPSET.CONTROLLER_ID
GROUP BY 
    dbo.AOC_CHIPSET.AOC_ID

The problem is the result contains duplicates

duplicates

I know I can use the DISTINCT, but I can't figure out where / how to place it in conjunction with string_agg?

peka
  • 51
  • 1
  • 7
  • 1
    Example data and desidered output will be useful! [Please do not post image of data](https://stackoverflow.com/help/how-to-ask) – Renato Feb 09 '20 at 22:26
  • Can you tell more about your schema. Especially, it is important to know which combinations of columns are unique or primary keys in all these tables. – Olivier Jacot-Descombes Feb 10 '20 at 20:18
  • Not sure what I added in my question is good enough. I don't want to sound confusing and not sure how to explain better without providing pictures (like Diagram) or perhaps sample code. It looks like neither is acceptable since someone mentioned above not to post pictures :-( – peka Feb 11 '20 at 18:04
  • Thank you for the additional information. Does the SQL from my answer help? – Olivier Jacot-Descombes Feb 11 '20 at 21:01

3 Answers3

0

I don't fully understand what the purpose/use of this query is, so forgive me if this doesn't help. But I don't think DISTINCT will be of much use to you with your AOC_ID being a unique value in each row of the duplicate data. If you got rid of the AOC_ID, then you could use DISTINCT and not have to worry about that data appearing twice. But again, if getting rid of the AOC_ID in this query is not an option, I'm as stumped as you are.

0
SELECT MAX(AOC_ID) as AOC_ID, vControllers, vChipsets, vDate, vNotes FROM
(
 SELECT
    dbo.AOC_CHIPSET.AOC_ID, 
    string_agg(dbo.CONTROLLER.CONTROLLER, ', ') AS vControllers, 
    string_agg(dbo.CHIPSET.CHIPSET, ', ') AS vChipsets, 
    string_agg(dbo.NOTES_CHIPSET.DATE, ', ') AS vDate,
    string_agg(dbo.NOTES_CHIPSET.NOTES, ', ') AS vNotes
FROM
    dbo.AOC_CHIPSET 
INNER JOIN 
    dbo.CHIPSET ON dbo.AOC_CHIPSET.CONTROLLER_ID = dbo.CHIPSET.CONTROLLER_ID 
                AND dbo.AOC_CHIPSET.CHIPSET_ID = dbo.CHIPSET.CHIPSET_ID 
INNER JOIN 
    dbo.CONTROLLER ON dbo.CHIPSET.CONTROLLER_ID = dbo.CONTROLLER.CONTROLLER_ID 
INNER JOIN 
    dbo.NOTES_CHIPSET ON dbo.CHIPSET.CONTROLLER_ID = dbo.NOTES_CHIPSET.CONTROLLER_ID  
                      AND dbo.CHIPSET.CHIPSET_ID = dbo.NOTES_CHIPSET.CHIPSET_ID 
                      AND dbo.CONTROLLER.CONTROLLER_ID = dbo.NOTES_CHIPSET.CONTROLLER_ID
GROUP BY 
    dbo.AOC_CHIPSET.AOC_ID
) R
GROUP BY vControllers, vChipsets, vDate, vNotes
Agneum
  • 727
  • 7
  • 23
0

I replaced the joins by sub-selects placed directly in the select list. This allows me to select distinct values per retrieved property. In fact, there are always 2 sub-selects per string to create. The inner one has a SELECT DISTINCT and the outer one makes the string_agg. The inner sub-select filters its rows with a WHERE-clause with AOC_ID matching the main SELECT

SELECT
    ac.AOC_ID,
    (SELECT string_agg(CONTROLLER, ', ') FROM
      (SELECT DISTINCT CONTROLLER
       FROM dbo.CONTROLLER co INNER JOIN dbo.AOC_CHIPSET ac1
          ON ac1.CONTROLLER_ID = co.CONTROLLER_ID
       WHERE ac1.AOC_ID = ac.AOC_ID) x) AS vControllers,
    (SELECT string_agg(CHIPSET, ', ') FROM
      (SELECT DISTINCT CHIPSET
       FROM dbo.CHIPSET cs INNER JOIN dbo.AOC_CHIPSET ac2 
          ON ac2.CONTROLLER_ID = cs.CONTROLLER_ID AND ac2.CHIPSET_ID = cs.CHIPSET_ID
       WHERE ac2.AOC_ID = ac.AOC_ID) y) AS vChipsets,
    (SELECT string_agg([DATE], ', ') FROM
      (SELECT DISTINCT [DATE]
       FROM dbo.NOTES_CHIPSET nd INNER JOIN dbo.AOC_CHIPSET ac3
          ON ac3.CONTROLLER_ID = nd.CONTROLLER_ID AND ac3.CHIPSET_ID = nd.CHIPSET_ID
       WHERE ac3.AOC_ID = ac.AOC_ID) z) AS vDate,
    (SELECT string_agg(NOTES, ', ') FROM
      (SELECT DISTINCT NOTES
       FROM dbo.NOTES_CHIPSET nd INNER JOIN dbo.AOC_CHIPSET ac4
          ON ac4.CONTROLLER_ID = nd.CONTROLLER_ID AND ac4.CHIPSET_ID = nd.CHIPSET_ID
       WHERE ac4.AOC_ID = ac.AOC_ID) z) AS vNotes
FROM
    dbo.AOC_CHIPSET ac
GROUP BY
    ac.AOC_ID

The SELECT DISTINCT in the sub-queries work, because the select-list does not contain any controller or chipset id. This would not be possible with joins on the outer level, since those require these ids.

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
  • Okay, now I reworked my answer completely after having created an example on my SQL-Server with the schema you have added to your question. – Olivier Jacot-Descombes Feb 11 '20 at 22:00
  • To improve on visibility, I am thinking to remove the vDate Column and perhaps use the Date to be a separator under the vNotes. Would it be easy to achieve something like this under the vNotes where each date starts as a new line item followed by ":" space and Comment. I would like to make it look like this: [link](https://photos.app.goo.gl/mycxVJutaFfQrxiG6). I can't think of a way on how to accomplish this task – peka Mar 02 '20 at 22:40
  • It is certainly possible; however, personally, I prefer doing such advanced things in a report generator, rather than in SQL. In SQL you would have to integrate a notes sub-query into the dates sub-query: `SELECT string_agg([DATE], ', ') + ': ' + () FROM ...`, where the notes sub-query would additionally have a date condition in the WHERE clause. But this tends to become more and more unreadable with all these sub-sub-sub-queries. – Olivier Jacot-Descombes Mar 03 '20 at 13:48
  • Ok... Thank you for the info – peka Mar 03 '20 at 23:41