0

I've found a lot of questions in here but none seems to resolve it. I do want to retrieve unique values with STRING_AGG() in SQL Server without using the keyword WITH.

This is my query:

SELECT DISTINCT 
    bld.Code AS building_code, 
    -- CONCAT(bld.Name, ' - via ', dir.Name) AS building_name,
    STRING_AGG(CAST(buildings.evaluation AS NVARCHAR(MAX)), ', ') AS [data] 
FROM
    Dealer de 
INNER JOIN 
    Road ro ON de.ID = ro.Dealer_ID  
INNER JOIN 
    Direction dir ON ro.ID = dir.Road_ID  
INNER JOIN 
    Building bld ON dir.ID = bld.Direction_ID  
INNER JOIN 
    BuildingType bt ON bld.BuildingType_ID = bt.ID  
INNER JOIN 
    StructuralUnit su ON bld.ID = su.Building_ID  
INNER JOIN 
    Span sp ON su.ID = sp.StructuralUnit_ID  
INNER JOIN 
    UnitElement ue ON sp.ID = ue.Span_ID  
INNER JOIN 
    (SELECT DISTINCT TOP 8 
         bld.Code AS building_code, 
         CONCAT(bld.Name, ' - via ', dir.Name) AS building_name,            
         dir.Name AS direction, 
         ROUND(AVG(ins.Evaluation), 2) AS evaluation, 
         YEAR(ig.Date) AS year      
     FROM  
         Building bld   
     INNER JOIN 
         Direction dir ON bld.Direction_ID = dir.ID     
     INNER JOIN 
         Road ro ON dir.Road_ID = ro.ID     
     INNER JOIN 
         Dealer de ON ro.Dealer_ID = de.ID  
     INNER JOIN 
         StructuralUnit su ON bld.ID = su.Building_ID   
     INNER JOIN 
         Span sp ON su.ID = sp.StructuralUnit_ID    
     INNER JOIN 
         UnitElement ue ON sp.ID = ue.Span_ID   
     INNER JOIN 
         Inspection ins ON ue.ID = ins.UnitElement_ID   
     INNER JOIN 
         InspectionGroup ig ON ig.ID = ins.InspectionGroup_ID   
     WHERE 
         ue.Status = 3      
         AND de.ID = 1      
         AND YEAR(ig.Date) IN (2021, 2020, 2019, 2018)      
         AND ig.InspectionTypeModel_ID <> 3     
     GROUP BY 
         bld.Code, bld.Name, dir.Name, YEAR(ig.Date)    
    ORDER BY 
         bld.Code, YEAR(ig.Date)) buildings ON buildings.building_code = bld.Code 
WHERE 
    ue.Status = 3   
    AND de.ID = 1  
GROUP BY 
    bld.Code --, bld.Name, dir.Name 
ORDER BY 
    bld.Code

The subquery (the one with the ROUND(AVG(ins.Evaluation), 2) AS evaluation) produces the correct distinct values:

enter image description here

But the full query doesn't work the way I though (with the DISTINCT and GROUP BY I though values would be unique, just 4 per row)

enter image description here

My desired results would be like this (don't mind the 3 dots at the end, I just resized the screen to show you only 4 values, that's how it's supposed to be)

enter image description here

Any suggestion, without using WITH?

Dale K
  • 25,246
  • 15
  • 42
  • 71
alesssz
  • 384
  • 4
  • 18
  • 4
    Why do you think `WITH` is needed here, and if so why would it be a problem ? Maybe it would be better if you explained what the exact you are trying to achieve here, sample data and expected outcome would help very very much here – GuidoG Sep 01 '22 at 06:45
  • As per the question guide, please do not post images of code, data, error messages, etc. - copy or type the text into the question. Please reserve the use of images for diagrams or demonstrating rendering bugs, things that are impossible to describe accurately via text. – Dale K Sep 01 '22 at 07:17

1 Answers1

0

Your main issue is that you are cross joining to your subquery. Your subquery has 4 rows for each building code, but you are only joining on code, therefore for every row in your outer query, you are returning 4 extra rows because of the subquery. If your data varies this could go up to as many as 8 (because of the TOP 8 in the subquery, without this there's more or less no limit to the duplication).

As far as I can tell you can massively simplify the query, none of the outer query is required at all, and with a small change to the grouping on the subquery you can get the required data without all the additional joins. The TOP also seems unnecessary, with the proper grouping you would only ever get a maximum 4 rows per code anyway (one per year in the IN clause):

SELECT  b.Code,
        Data = STRING_AGG(b.evaluation, ',') WITHIN GROUP(ORDER BY b.Year)
FROM    (
            SELECT  bld.Code,
                    Year = YEAR(ig.Date),
                    evaluation = ROUND(AVG(ins.Evaluation), 2)
            FROM    Building AS bld
                    INNER JOIN Direction AS dir
                        ON bld.Direction_ID = dir.ID
                    INNER JOIN Road AS ro
                        ON dir.Road_ID = ro.ID
                    INNER JOIN Dealer AS de
                        ON ro.Dealer_ID = de.ID
                    INNER JOIN StructuralUnit AS su
                        ON bld.ID = su.Building_ID
                    INNER JOIN Span AS sp
                        ON su.ID = sp.StructuralUnit_ID
                    INNER JOIN UnitElement AS ue
                        ON sp.ID = ue.Span_ID
                    INNER JOIN Inspection AS ins
                        ON ue.ID = ins.UnitElement_ID
                    INNER JOIN InspectionGroup AS ig
                        ON ig.ID = ins.InspectionGroup_ID
            WHERE   ue.Status = 3
            AND     de.ID = 1
            AND     YEAR(ig.Date) IN (2021, 2020, 2019, 2018)
            AND     ig.InspectionTypeModel_ID <> 3
            GROUP BY
                    bld.Code, YEAR(ig.Date)
        ) AS b
GROUP BY b.Code;
GarethD
  • 68,045
  • 10
  • 83
  • 123
  • I tried this quickly and it seems to work pretty fine! Few questions, so I can understand correctly: - what `WITHING GROUP` is doing with `STRING_AGG`? - What does it means to put a = inside the select? Like `Year = YEAR(ig.Date)`? Is this the same to write `YEAR(ig.Date) AS Year`? – alesssz Sep 01 '22 at 07:03
  • 1
    `WITHIN GROUP` just means that you can specify an order so that your values will always be from oldest to newest. You can of course order by anything you like but the data in your example was ordered by year so this is what I did. More on this in the [documentation](https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16). – GarethD Sep 01 '22 at 07:08
  • 1
    Yes `Year = YEAR(ig.Date)` is the same as `YEAR(ig.Date) AS Year`, sorry, I use SQL Prompt and when it formats it converts all aliases to this layout as it is my personal preference, so after formatting my answer this was done automatically. I'd normally make more of an effort to retain any conventions/styles used in the question. My reasons for this preference closely align with those listed in [this article](https://sqlblog.org/2012/01/23/bad-habits-to-kick-using-as-instead-of-for-column-aliases), but I appreciate that this is very subjective. – GarethD Sep 01 '22 at 07:09
  • It's all clear now, thank you very much, it works well! Thank you for the explanation! – alesssz Sep 01 '22 at 07:17
  • 1
    `AND YEAR(ig.Date) IN (2021, 2020, 2019, 2018)` probably better if you do `AND ig.Date >= '20180101' AND ig.Date < '20220101'` – Charlieface Sep 01 '22 at 11:15