2

I have ugly XML that looks like this:

<?xml version="1.0"?>
<MainTag xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" RecordID="201801026543210">
    <Field Name="TheFieldName">
        <FieldValue>Field Contents</FieldValue>
        <ListTag>
            <ListItem>
                <Value>This List Value</Value>
                <Source>source.txt</Source>
                <ListType>text</ListType>
                <ItemNumber>6912</ItemNumber>
                <MoreData>some text here</MoreData>
                <Address>address data</Address>
                <Ranking>102</Ranking>
            </ListItem>
            <ListItem>
                <Value>Another List Value</Value>
                <Source>other.txt</Source>
                <ListType>text</ListType>
                <ItemNumber>7919</ItemNumber>
                <MoreData>more text here</MoreData>
                <Address>address data</Address>
                <Ranking>41</Ranking>
            </ListItem>
        </ListTag>
    </Field>
</MainTag>

What I want is query results that gives me a spreadsheet, essentially:

RecordID        FieldName       FieldValue      ListValue           ListSource  ListType    ListItemNumber  …
201801026543210 TheFieldName    Field Contents  This List Value     source.txt  text        6912    
201801026543210 TheFieldName    Field Contents  Another List Value  other.txt   text        7919    

To aid in the fun, the XML is stored in a varchar field, not an XML field.

As example data and queries I've tried:

DECLARE @Tbl TABLE ( 
    TblID varchar(15)
    , Fld varchar(max) 
)
INSERT INTO @Tbl SELECT '201801026543210', '<?xml version="1.0"?><MainTag xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" RecordID="201801026543210"><Field Name="TheFieldName"><FieldValue>Field Contents</FieldValue><ListTag><ListItem><Value>This List Value</Value><Source>source.txt</Source><ListType>text</ListType><ItemNumber>6912</ItemNumber><MoreData>some text here</MoreData><Address>address data</Address><Ranking>102</Ranking></ListItem><ListItem><Value>Another List Value</Value><Source>other.txt</Source><ListType>text</ListType><ItemNumber>7919</ItemNumber><MoreData>more text here</MoreData><Address>address data</Address><Ranking>41</Ranking></ListItem></ListTag></Field></MainTag>'


-- this shows that i am reading the main tag
SELECT t.r.value('@RecordID','varchar(15)') AS RecordID
    , t.r.query('.') as fullvalue
FROM (
    SELECT top 10 CONVERT(xml, Fld) AS Fld, TblID
    FROM @Tbl 
) AS s
CROSS APPLY Fld.nodes('/MainTag') AS t(r)


-- and this works to read the attribute from the first field
SELECT t.r.value('@RecordID', 'varchar(18)') AS RecordID
    , f.r.value('@Name', 'varchar(100)') AS Field
FROM (
    SELECT top 10 CONVERT(xml, Fld) AS Fld, TblID
    FROM @Tbl 
) AS s
CROSS APPLY Fld.nodes('/MainTag') AS t(r)
CROSS APPLY Fld.nodes('/MainTag/Field') AS f(r)


-- this does NOT work to read the second field
SELECT t.r.value('@RecordID','varchar(18)') AS RecordID
    , f.r.value('.', 'varchar(100)') AS ValueF
    , p.r.query('.') AS QueryP
    , p.r.value('.', 'varchar(100)') AS ValueP
FROM (
    SELECT top 10 CONVERT(xml, Fld) AS Fld, TblID
    FROM @Tbl 
) AS s
CROSS APPLY Fld.nodes('/MainTag') AS t(r)
CROSS APPLY Fld.nodes('/MainTag/Field') AS f(r)
CROSS APPLY Fld.nodes('/MainTag/FieldValue') AS p(r)


-- i honestly feel like this should be using nodes off of the parent nodes method
-- , but this is NOT working either
SELECT t.r.value('@RecordID','varchar(18)') AS RecordID
    , p.r.query('.') AS FieldValue
FROM (
    SELECT top 10 CONVERT(xml, Fld) AS Fld, TblID
    FROM @Tbl 
) AS s
CROSS APPLY s.Fld.nodes('/MainTag') AS t(r)
CROSS APPLY t.r.nodes('/MainTag/FieldValue') AS p(r)

I've seen some other questions/samples that grab just the first entry. I do not want that. I want a big blob of data. I want the unique fields to repeat for each "ListItem". To say that another way: I know this is akin to a one-to-many join where the fields in the "one" table will repeat for each row in the "many" table.

Technically, in my data there may be more than one "Field" as well.
There is only one FieldValue for that field.
There is one or more ListItems.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
isopropanol
  • 445
  • 4
  • 10
  • btw: This is not an *ugly* XML... This is a rather beautiful XML acutally :-) – Shnugo Mar 20 '18 at 16:06
  • btw2: This is a good question: sample, MCVE, own attempts... +1 from my side – Shnugo Mar 20 '18 at 16:30
  • The part I was getting stuck on was FieldValue. I realized after doing the indents for this question that I was trying to access it at /MainTag/FieldValue, but it lives at /MainTag/Field/FieldValue. – isopropanol Mar 20 '18 at 16:45

1 Answers1

2

Try it with this query:

SELECT s.Fld.value('(/MainTag/@RecordID)[1]','bigint') AS RecordID
      ,A.f.value('@Name','nvarchar(max)') as FieldName
      ,A.f.value('(FieldValue/text())[1]','nvarchar(max)') as FieldValue
      ,B.li.value('(Value/text())[1]','nvarchar(max)') as ListValue
      ,B.li.value('(Source/text())[1]','nvarchar(max)') as ListSource
      --and so on
FROM (
    SELECT top 10 CONVERT(xml, Fld) AS Fld, TblID
    FROM @Tbl 
) AS s
CROSS APPLY Fld.nodes('/MainTag/Field') AS A(f)
CROSS APPLY A.f.nodes('ListTag/ListItem') AS B(li)

Hint

If there is any chance to change storage from VARCHAR to XML it's worth it...

Shnugo
  • 66,100
  • 9
  • 53
  • 114