1

I have the following XML data

<browse result="1">
  <th>
    <td label="Company ID"></td>
    <td label="Company Name"></td>
    <td label="Country"></td>
    <td label="Region"></td>
  </th>
  <tr>
    <td>ABC01</td>
    <td>Company 1</td>
    <td>United States</td>
    <td>North America</td>
  </tr>
  <tr>
    <td>ABC02</td>
    <td>Company 2</td>
    <td>China</td>
    <td>Asia</td>
  </tr>
</browse>

my final table should look like this

Company ID Company Name Country Region
ABC01 Company 1 United States North America
ABC02 Company 2 China Asia

or its pivoted version

label value1 value2
Company ID ABC01 ABC02
Company Name Company 1 Company 2
Country United States China
Region North America Asia

This is what I have tried so far

declare @xmldata nvarchar(4000) = '<browse result="1">
  <th>
    <td label="Company ID"></td>
    <td label="Company Name"></td>
    <td label="Country"></td>
    <td label="Region"></td>
  </th>
  <tr>
    <td>ABC01</td>
    <td>Company 1</td>
    <td>United States</td>
    <td>North America</td>
  </tr>
  <tr>
    <td>ABC02</td>
    <td>Company 2</td>
    <td>China</td>
    <td>Asia</td>
  </tr>
</browse>'

select
     a.value('@label', 'varchar(max)') as label
    ,b.value('.', 'nvarchar(max)') as value
from (select top (1) CAST(REPLACE(CAST(@xmldata AS NVARCHAR(MAX)),'utf-8','utf-16') AS XML) as xmldata) as s
    outer apply s.xmldata.nodes('/browse/th/td') as l(a)
    outer apply s.xmldata.nodes('/browse/tr/td') as v(b)

but it gives me 32 rows, and most of them is wrongly duplicated combinations of the 4 labels with the values. How should I accomplish this?

Dale K
  • 25,246
  • 15
  • 42
  • 71
xE99
  • 62
  • 7
  • Are the column/label names fixed or are they dynamic? Can they change order? – Charlieface Aug 30 '23 at 15:38
  • hi! they are fixed. what is stated in the labels is fixed throughout the whole file – xE99 Aug 30 '23 at 15:40
  • 2
    If they are not fixed you can use some XQuery voodoo to get the position, and then refer back to it in a predicate https://dbfiddle.uk/-jo-Loni – Charlieface Aug 30 '23 at 15:56
  • That really is some voodoo magic there, @Charlieface . Based on the fiddle's name, can we call the "Jo Loni technique"? :) – Thom A Aug 30 '23 at 15:59
  • @ThomA It's an old trick, I used it a few times `https://stackoverflow.com/search?q=user%3A14868997+let+return+count+%5Bsql*%5D` I don't remember where I got it. Also here is the oldest mention I can find of it https://stackoverflow.com/a/9863151/14868997 might have got it from there – Charlieface Aug 30 '23 at 16:04
  • @ThomA Basically you put the node you want in an XQuery variable `$i` then you count through all its sibling nodes that are before it, and add 1. It's really slow by the way, on big files it's often better to just use the more unreliable `row_number` – Charlieface Aug 30 '23 at 16:05

1 Answers1

3

If we can assume that the values are in static positions then you only need to look at the tr nodes, and then get each td node from that using their ordinal positions:

DECLARE @xml xml = '<browse result="1">
  <th>
    <td label="Company ID"></td>
    <td label="Company Name"></td>
    <td label="Country"></td>
    <td label="Region"></td>
  </th>
  <tr>
    <td>ABC01</td>
    <td>Company 1</td>
    <td>United States</td>
    <td>North America</td>
  </tr>
  <tr>
    <td>ABC02</td>
    <td>Company 2</td>
    <td>China</td>
    <td>Asia</td>
  </tr>
</browse>';

SELECT b.th.value('(td/text())[1]','varchar(20)') AS CompanyID,
       b.th.value('(td/text())[2]','varchar(20)') AS CompanyName,
       b.th.value('(td/text())[3]','varchar(20)') AS Country,
       b.th.value('(td/text())[4]','varchar(20)') AS Region
FROM @xml.nodes('browse/tr') b(th);

You'll likely need to adjust the data types, but I very much doubt that varchar(MAX) is appropriate; if you are using IDs that are over 8,000 characters long then you have a problem.

Thom A
  • 88,727
  • 11
  • 45
  • 75
  • may i ask what should be added if some of the td nodes are null? for example ABC03 Asia. in this case, "Asia" would go directly to the Company Name instead of staying in Region. – xE99 Aug 31 '23 at 13:03
  • 1
    Change to `(td[1]/text())[1]`, `(td[2]/text())[1]`, etc, @xE99 . – Thom A Aug 31 '23 at 13:07