How to access the next sibling of an xml tag in openxml sql

Question

So i have an XML file which looks like below

declare @xml
xml= '<ENVELOPE>
    <BILLFIXED>
        <BILLDATE>29-Jun-2019</BILLDATE>
        <BILLREF>123</BILLREF>
        <BILLPARTY>ABC</BILLPARTY>
    </BILLFIXED>
    <BILLOP>200</BILLOP>
    <BILLCL>200</BILLCL>
    <BILLDUE>29-Jun-2019</BILLDUE>
    <BILLOVERDUE>1116</BILLOVERDUE>
    <BILLFIXED>
        <BILLDATE>30-Jun-2019</BILLDATE>
        <BILLREF>April To June -19</BILLREF>
        <BILLPARTY>efg</BILLPARTY>
    </BILLFIXED>
    <BILLOP>100</BILLOP>
    <BILLCL>100</BILLCL>
    <BILLDUE>30-Jun-2019</BILLDUE>
    <BILLOVERDUE>1115</BILLOVERDUE>
</ENVELOPE>

Im trying to read this using openxml

DECLARE  @hDoc AS INT, @SQL NVARCHAR (MAX) 

EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML 

select   BILLDATE, BILLREF, BILLPARTY, BILLOP,BILLCL,BILLDUE, BILLOVERDUE
from OPENXML(@hDoc, '//BILLFIXED') 
WITH  
( 
BillDate [varchar](50) 'BILLDATE', 
BIllREF [varchar](50) 'BILLREF', 
BILLPARTY [varchar](100) 'BILLPARTY' 
,BILLOP [varchar](100) 'BILLOP' 
 BILLCL[varchar](100) 'REFERENCE', 
 BILLDUE [varchar](100) 'BILLDUE', 
 BILLOVERDUE [varchar](100) 'BILLOVERDUE'
)

It was easy to extract <BILLFIXED> tag but not able to access the siblings tags

<BILLCL>
<BILLDUE>
<BILLOVERDUE>

any help in accessing these tags Thanks

Try `//BILLFIXED | //BILLFIXED/following-sibling::*[name()='BILLCL' or name()='BILLDUE' or name()='BILLOVERDUE']` — LMC, Aug 31 '22 at 16:41
Tried this, it doesnt work error shows invalid axes name at following sibling — rajsx, Aug 31 '22 at 16:48
OK, then try `//BILLFIXED/../*[name()='BILLCL' or name()='BILLDUE' or name()='BILLOVERDUE']` or ``//*[name()='BILLCL' or name()='BILLDUE' or name()='BILLOVERDUE']`` — LMC, Aug 31 '22 at 16:53
So i tried this `//BILLFIXED/../*[name()='BILLCL']` but it shows the value of the first tag for all the rows — rajsx, Aug 31 '22 at 17:00
Pivot xml elements grouped by the number of previous BILLFIXED nodes `select max(case when col = 'BILLDATE' then val end) as BILLDATE, max(case when col = 'BILLREF' then val end) as BILLREF, max(case when col = 'BILLPARTY' then val end) as BILLPARTY, max(case when col = 'BILLOP' then val end) as BILLOP, max(case when col = 'BILLCL' then val end) as BILLCL, max(case when col = 'BILLDUE' then val end) as BILLPARTY, max(case when col = 'BILLOVERDUE' then val end) as BILLOVERDUE` — lptr, Sep 01 '22 at 00:07
`from (select b.b.value('local-name(.)', 'nvarchar(200)') as col, b.b.value('text()[1]', 'nvarchar(200)') as val, sum(case when b.b.value('local-name(.)', 'nvarchar(200)') = 'BILLFIXED' then 1 else 0 end) over(order by b.b rows unbounded preceding) as grp from @xml.nodes('/ENVELOPE//*') as b(b) ) as x group by grp` — lptr, Sep 01 '22 at 00:08

Charlieface · Accepted Answer · 2022-09-01T12:14:52.160

I agree you should not use OPENXML, and instead use .nodes and .value.

Unfortunately, SQL Server does not allow the sibling:: axis in XQuery, which would have made this much easier.

You can do this purely using XQuery, by using the >> positional predicate.

SELECT
      c.value('(BILLDATE /text())[1]', 'VARCHAR(20)') AS BILLDATE
    , c.value('(BILLREF  /text())[1]', 'VARCHAR(20)') AS BILLREF
    , c.value('(BILLPARTY/text())[1]', 'VARCHAR(20)') AS BILLPARTY
    , c.value('let $i := . return (/ENVELOPE/BILLOP     [. >> $i]/text())[1]', 'INT') AS BILLOP
    , c.value('let $i := . return (/ENVELOPE/BILLCL     [. >> $i]/text())[1]', 'INT') AS BILLCL
    , c.value('let $i := . return (/ENVELOPE/BILLDUE    [. >> $i]/text())[1]', 'VARCHAR(20)') AS BILLDUE
    , c.value('let $i := . return (/ENVELOPE/BILLOVERDUE[. >> $i]/text())[1]', 'INT') AS BILLOVERDUE
FROM @xml.nodes('/ENVELOPE/BILLFIXED') AS t(c);

What this does is as follows:

Shred just the /ENVELOPE/BILLFIXED nodes.
For each of those, return the children as normal.
For each sibling, do the following steps:
- Store the current node in $i.
- Take the first child node matching the correct name, of the parent,...
- ...where that node is positioned after $i, so [. >> $i]
- Take the first child node's text [1]

db<>fiddle

If some of the nodes could be empty or missing, then you also need to check that the sibling node is before the next BILLFIXED node

SELECT
      c.value('(BILLDATE /text())[1]', 'VARCHAR(20)') AS BILLDATE
    , c.value('(BILLREF  /text())[1]', 'VARCHAR(20)') AS BILLREF
    , c.value('(BILLPARTY/text())[1]', 'VARCHAR(20)') AS BILLPARTY
    , c.value('let $i := . let $nxt := (/ENVELOPE/BILLFIXED[. >> $i])[1] return (/ENVELOPE/BILLOP     [. >> $i][. << $nxt]/text())[1]', 'INT') AS BILLOP
    , c.value('let $i := . let $nxt := (/ENVELOPE/BILLFIXED[. >> $i])[1] return (/ENVELOPE/BILLCL     [. >> $i][. << $nxt]/text())[1]', 'INT') AS BILLCL
    , c.value('let $i := . let $nxt := (/ENVELOPE/BILLFIXED[. >> $i])[1] return (/ENVELOPE/BILLDUE    [. >> $i][. << $nxt]/text())[1]', 'VARCHAR(20)') AS BILLDUE
    , c.value('let $i := . let $nxt := (/ENVELOPE/BILLFIXED[. >> $i])[1] return (/ENVELOPE/BILLOVERDUE[. >> $i][. << $nxt]/text())[1]', 'INT') AS BILLOVERDUE
FROM @xml.nodes('/ENVELOPE/BILLFIXED') AS t(c);

db<>fiddle

One final option is to transform the XML into a more normalized structure, and then query that

SELECT
      f.value('(BILLDATE   /text())[1]', 'VARCHAR(20)') AS BILLDATE
    , f.value('(BILLREF    /text())[1]', 'VARCHAR(20)') AS BILLREF
    , f.value('(BILLPARTY  /text())[1]', 'VARCHAR(20)') AS BILLPARTY
    , c.value('(BILLOP     /text())[1]', 'INT') AS BILLOP
    , c.value('(BILLCL     /text())[1]', 'INT') AS BILLCL
    , c.value('(BILLDUE    /text())[1]', 'VARCHAR(20)') AS BILLDUE
    , c.value('(BILLOVERDUE/text())[1]', 'INT') AS BILLOVERDUE
FROM (VALUES(
  @xml.query('
    for $bf in /ENVELOPE/BILLFIXED
    let $nxt := (/ENVELOPE/BILLFIXED[. >> $bf])[1]
    return
      <ENVELOPE>
        {$bf}
        {
            if ($nxt) then
                /ENVELOPE/*[. >> $bf][. << $nxt]
            else
                /ENVELOPE/*[. >> $bf]
        }
      </ENVELOPE>
  ')
) ) v(transformed)
CROSS APPLY v.transformed.nodes('/ENVELOPE') AS t(c)
CROSS APPLY t.c.nodes('BILLFIXED') t2(f);

db<>fiddle

Steps are as follows:

Use .query to create a new XML document.
For each BILLFIXED node, store that in $bf
Store the BILLFIXED node which follows $bf in $nxt
Return an ENVELOPE node, containing $bf as well as all nodes which...
- ... are after $bf
- ... and before $nxt if there is a $nxt
Query that result as normal, using .nodes

Can you share any documentation or reference link that might help me to learn more about xquery? — rajsx, Sep 01 '22 at 06:03
https://learn.microsoft.com/en-us/sql/xquery/xquery-language-reference-sql-server?view=sql-server-ver16 is a good start. I'm curious how much this option https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=1fe43bcfe99baab793583973ad0131d3 takes compared to mine and Yitzhak's versions — Charlieface, Sep 01 '22 at 09:28
If any of BILLOP,BILLCL,BILLDUE,BILLOVERDUE is empty then the results might get skewed https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c1c904e7f17d20321d4a4c786284954a — lptr, Sep 01 '22 at 10:21
True, you could fix that by comparing also to the next `BILLFIXED` node https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=eb22d9078a34554ab9e9965ac96a436c — Charlieface, Sep 01 '22 at 10:31
The `not($nxt)` in the xpath filter is evaluated for every node and it hurts performance..one option would be to place the check in an if() to conditionally return the nodes after BILLFIXED.. https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=faeaff33e8ff86d7d622f4653577f78c — lptr, Sep 01 '22 at 12:09

Yitzhak Khabinsky · Answer 2 · 2022-08-31T20:21:20.700

Microsoft proprietary OPENXML() and its companions sp_xml_preparedocument and sp_xml_removedocument are kept just for backward compatibility with the obsolete SQL Server 2000. Their use is diminished just to very few fringe cases. Starting from SQL Server 2005 onwards, it is strongly recommended to re-write your SQL and switch it to XQuery. Also, OPENXML() cannot take advantage of XML indexes while XQuery methods can.

Please try the following solution.

The XML sample is flattened.

So, we are using full power of XQuery.

SQL

DECLARE @xml XML = 
N'<ENVELOPE>
    <BILLFIXED>
        <BILLDATE>29-Jun-2019</BILLDATE>
        <BILLREF>123</BILLREF>
        <BILLPARTY>ABC</BILLPARTY>
    </BILLFIXED>
    <BILLOP>200</BILLOP>
    <BILLCL>200</BILLCL>
    <BILLDUE>29-Jun-2019</BILLDUE>
    <BILLOVERDUE>1116</BILLOVERDUE>
    <BILLFIXED>
        <BILLDATE>30-Jun-2019</BILLDATE>
        <BILLREF>April To June -19</BILLREF>
        <BILLPARTY>efg</BILLPARTY>
    </BILLFIXED>
    <BILLOP>100</BILLOP>
    <BILLCL>100</BILLCL>
    <BILLDUE>30-Jun-2019</BILLDUE>
    <BILLOVERDUE>1115</BILLOVERDUE>
</ENVELOPE>';

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Nmbr 
    , c.value('(./BILLDATE/text())[1]', 'VARCHAR(20)') AS BILLDATE
    , c.value('(./BILLREF/text())[1]', 'VARCHAR(20)') AS BILLREF
    , c.value('(./BILLPARTY/text())[1]', 'VARCHAR(20)') AS BILLPARTY
    , c.value('(/ENVELOPE/*[sql:column("seq.pos")]/text())[1]', 'INT') AS BILLOP
    , c.value('(/ENVELOPE/*[sql:column("seq.pos") + 1]/text())[1]', 'INT') AS BILLCL
    , c.value('(/ENVELOPE/*[sql:column("seq.pos") + 2]/text())[1]', 'VARCHAR(20)') AS BILLDUE
    , c.value('(/ENVELOPE/*[sql:column("seq.pos") + 3]/text())[1]', 'INT') AS BILLOVERDUE
    , seq.pos   -- just to see
FROM @xml.nodes('/ENVELOPE/BILLFIXED') AS t(c)
   CROSS APPLY (SELECT t.c.value('let $n := . return count(/ENVELOPE/*[. << $n[1]]) + 2','INT') AS pos
         ) AS seq;

Output

Nmbr	BILLDATE	BILLREF	BILLPARTY	BILLOP	BILLCL	BILLDUE	BILLOVERDUE	pos
1	29-Jun-2019	123	ABC	200	200	29-Jun-2019	1116	2
2	30-Jun-2019	April To June -19	efg	100	100	30-Jun-2019	1115	7

It is possible to do this purely in XQuery without a cross-join, hard to say which is faster. You definitely don't need `*`, that's going to be slow. Just use the normal node names, for example https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=1fe43bcfe99baab793583973ad0131d3 — Charlieface, Aug 31 '22 at 20:14
@Charlieface, followed you advice, and removed `*`. Thanks. By the way, Query cost relative to the batch shows 50% for both versions. — Yitzhak Khabinsky, Aug 31 '22 at 20:22
It would, the optimizer hasn't the faintest idea how much it actualy costs to query the XML. Need to time it properly — Charlieface, Aug 31 '22 at 20:26
thanks for your suggestion & answer Can u share any references or documentation with examples to understand xquery better? — rajsx, Sep 01 '22 at 06:04

score 1 · Answer 3 · answered Sep 01 '22 at 10:50

Pivot the xml elements grouped by the number of preceding BILLFIXED nodes, fiddle.

declare @xml
xml= '<ENVELOPE>'+
replicate( cast('
    <BILLFIXED>
        <BILLDATE>29-Jun-2019</BILLDATE>
        <BILLREF>123</BILLREF>
        <BILLPARTY>ABC</BILLPARTY>
    </BILLFIXED>
    <BILLOP>200</BILLOP>
    <BILLCL>200</BILLCL>
    <BILLDUE>29-Jun-2019</BILLDUE>
    <BILLOVERDUE>1116</BILLOVERDUE>
    <BILLFIXED>
        <BILLDATE>30-Jun-2019</BILLDATE>
        <BILLREF>April To June -19</BILLREF>
        <BILLPARTY>efg</BILLPARTY>
    </BILLFIXED>
    <BILLOP>100</BILLOP>
    <BILLCL>100</BILLCL>
    <BILLDUE>30-Jun-2019</BILLDUE>
    <BILLOVERDUE>1115</BILLOVERDUE>' as nvarchar(max)), 100) +'
</ENVELOPE>';


select 
  max(case when col = 'BILLDATE' then val end) as BILLDATE,
  max(case when col = 'BILLREF' then val end) as BILLREF,
  max(case when col = 'BILLPARTY' then val end) as BILLPARTY,
  max(case when col = 'BILLOP' then val end) as BILLOP,
  max(case when col = 'BILLCL' then val end) as BILLCL,
  max(case when col = 'BILLDUE' then val end) as BILLPARTY,
  max(case when col = 'BILLOVERDUE' then val end) as BILLOVERDUE
from
(
select 
  b.b.value('local-name(.)', 'nvarchar(200)') as col, b.b.value('text()[1]', 'nvarchar(200)') as val,
  sum(case when b.b.value('local-name(.)', 'nvarchar(200)') = 'BILLFIXED' then 1 else 0 end) 
  over(order by b.b rows unbounded preceding) as grp
from @xml.nodes('/ENVELOPE//*') as b(b)
) as x
group by grp;

How to access the next sibling of an xml tag in openxml sql

3 Answers3

Linked

Related