0

I am trying to export some information out of an xml column.

Logic applied: In each part, the actor Name may appear twice. (actor + song) result to a unique combination.

Keep only when at aggregation level, scene=1. We don't care about scene=2. We care about the lines for specific actors. Simba, Mufasa and Zazu. Let's not worry about Scar. He is mean. My expected output is:

output

My T-Sql code so far: I have managed to filter out the aggregate level = 2.

Could somebody assist?

DECLARE @Temp AS TABLE (Information xml)
INSERT INTO @Temp SELECT @x;
WITH XMLNAMESPACES ('lyceum' AS ns1)
SELECT
    Z.query('ns1:feature')
FROM @Temp
CROSS APPLY Information.nodes('/ns1:report/ns1:detail/ns1:parts/ns1:part') AS Y(Z)
WHERE Z.exist('ns1:aggregation[.="1"]') = 1

Below you can find the xml in order to have something easily reproducible:

DECLARE @x as xml;
set @x = '<ns1:report xmlns:ns1="lyceum">
    <ns1:header>
        <ns1:report>The Lion King</ns1:report>
    </ns1:header>
    <ns1:detail>
        <ns1:parts>
            <ns1:part>
                <ns1:aggregation>
                    <ns1:scene>1</ns1:scene>
                </ns1:aggregation>
                <ns1:feature>
                    <ns1:actor>Simba</ns1:actor>
                    <ns1:lines>100</ns1:lines>
                    <ns1:song>1</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Mufasa</ns1:actor>
                    <ns1:lines>200</ns1:lines>
                    <ns1:song>1</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Simba</ns1:actor>
                    <ns1:lines>300</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Zazu</ns1:actor>
                    <ns1:lines>400</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Scar</ns1:actor>
                    <ns1:lines>500</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
            </ns1:part>
            <ns1:part>
                <ns1:aggregation>
                    <ns1:scene>2</ns1:scene>
                </ns1:aggregation>
                <ns1:feature>
                    <ns1:actor>Simba</ns1:actor>
                    <ns1:lines>600</ns1:lines>
                    <ns1:song>1</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Mufasa</ns1:actor>
                    <ns1:lines>700</ns1:lines>
                    <ns1:song>1</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Simba</ns1:actor>
                    <ns1:lines>800</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Zazu</ns1:actor>
                    <ns1:lines>900</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Scar</ns1:actor>
                    <ns1:lines>1000</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
            </ns1:part>
        </ns1:parts>
    </ns1:detail>
</ns1:report>'
Thanasis K
  • 125
  • 6

1 Answers1

1

You are asking for XQuery to select and group: Try this:

A mockup table to simulate your issue:

DECLARE @Temp AS TABLE (Information xml);
INSERT @Temp VALUES
('<ns1:report xmlns:ns1="lyceum">
    <ns1:header>
        <ns1:report>The Lion King</ns1:report>
    </ns1:header>
    <ns1:detail>
        <ns1:parts>
            <ns1:part>
                <ns1:aggregation>
                    <ns1:scene>1</ns1:scene>
                </ns1:aggregation>
                <ns1:feature>
                    <ns1:actor>Simba</ns1:actor>
                    <ns1:lines>100</ns1:lines>
                    <ns1:song>1</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Mufasa</ns1:actor>
                    <ns1:lines>200</ns1:lines>
                    <ns1:song>1</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Simba</ns1:actor>
                    <ns1:lines>300</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Zazu</ns1:actor>
                    <ns1:lines>400</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
                <ns1:feature>
                    <ns1:actor>Scar</ns1:actor>
                    <ns1:lines>500</ns1:lines>
                    <ns1:song>2</ns1:song>
                </ns1:feature>
            </ns1:part>
            <ns1:part>
                <ns1:aggregation>
                    <ns1:scene>2</ns1:scene>
                </ns1:aggregation>
                <ns1:feature>
<!-- shortened for brevity -->
                </ns1:feature>
            </ns1:part>
        </ns1:parts>
    </ns1:detail>
</ns1:report>');

--The query

WITH XMLNAMESPACES('lyceum' AS ns1)
SELECT p.query('<root>
                {
                    for $s in distinct-values(ns1:feature/ns1:song/text())
                    return <song scene="{$s}">
                        {
                        for $f in ns1:feature[ns1:song[text()=$s]]
                        return
                        if($f/ns1:actor/text() != "Scar") then
                            <actor name="{$f/ns1:actor/text()}" lines="{$f/ns1:lines/text()}"/>
                        else
                            ()
                        }
                    </song>
                }
                </root>')
  
FROM @Temp t
CROSS APPLY t.Information.nodes('/ns1:report/ns1:detail/ns1:parts/ns1:part[(ns1:aggregation/ns1:scene/text())[1]="1"]') A(p);

The result

<root>
  <song scene="1">
    <actor name="Simba" lines="100" />
    <actor name="Mufasa" lines="200" />
  </song>
  <song scene="2">
    <actor name="Simba" lines="300" />
    <actor name="Zazu" lines="400" />
  </song>
</root>

The idea in short:

  • We us a predicate in .nodes() to return the part(s) where <scene>=1
  • We use .query() to return a newly created XML
  • We add (might be omitted) a <root> node
  • We add a FLWOR query running through a distinct-list of song ids
  • We use this song id in another FLOWR query as predicate to filter the fitting <feature> elements
  • We check for the actor's name not being "Scar"

The output you are asking for is an entirely different problem. Using content as column name demands either for knowing all expected names in advance (PIVOT or conditional aggregation) or to use dynamically created statements.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Thank you @Shnugo. You are right about the expected output on my question. It is a different problem indeed. – Thanasis K Jul 31 '20 at 08:17