1

So i have an XML file which looks like below

declare @xml
xml= '<ENVELOPE>
    <BILLFIXED>
        <BILLDATE>29-Jun-2019</BILLDATE>
        <BILLREF>123</BILLREF>
        <BILLPARTY>ABC</BILLPARTY>
    </BILLFIXED>
    <BILLOP>200</BILLOP>
    <BILLCL>200</BILLCL>
    <BILLDUE>29-Jun-2019</BILLDUE>
    <BILLOVERDUE>1116</BILLOVERDUE>
    <BILLFIXED>
        <BILLDATE>30-Jun-2019</BILLDATE>
        <BILLREF>April To June -19</BILLREF>
        <BILLPARTY>efg</BILLPARTY>
    </BILLFIXED>
    <BILLOP>100</BILLOP>
    <BILLCL>100</BILLCL>
    <BILLDUE>30-Jun-2019</BILLDUE>
    <BILLOVERDUE>1115</BILLOVERDUE>
</ENVELOPE>

Im trying to read this using openxml

DECLARE  @hDoc AS INT, @SQL NVARCHAR (MAX) 

EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML 

select   BILLDATE, BILLREF, BILLPARTY, BILLOP,BILLCL,BILLDUE, BILLOVERDUE
from OPENXML(@hDoc, '//BILLFIXED') 
WITH  
( 
BillDate [varchar](50) 'BILLDATE', 
BIllREF [varchar](50) 'BILLREF', 
BILLPARTY [varchar](100) 'BILLPARTY' 
,BILLOP [varchar](100) 'BILLOP' 
 BILLCL[varchar](100) 'REFERENCE', 
 BILLDUE [varchar](100) 'BILLDUE', 
 BILLOVERDUE [varchar](100) 'BILLOVERDUE'
) 

It was easy to extract <BILLFIXED> tag but not able to access the siblings tags

<BILLCL>
<BILLDUE>
<BILLOVERDUE>

any help in accessing these tags Thanks

Todd Main
  • 28,951
  • 11
  • 82
  • 146
rajsx
  • 61
  • 8
  • Try `//BILLFIXED | //BILLFIXED/following-sibling::*[name()='BILLCL' or name()='BILLDUE' or name()='BILLOVERDUE']` – LMC Aug 31 '22 at 16:41
  • Tried this, it doesnt work error shows invalid axes name at following sibling – rajsx Aug 31 '22 at 16:48
  • OK, then try `//BILLFIXED/../*[name()='BILLCL' or name()='BILLDUE' or name()='BILLOVERDUE']` or ``//*[name()='BILLCL' or name()='BILLDUE' or name()='BILLOVERDUE']`` – LMC Aug 31 '22 at 16:53
  • So i tried this `//BILLFIXED/../*[name()='BILLCL']` but it shows the value of the first tag for all the rows – rajsx Aug 31 '22 at 17:00
  • 1
    Pivot xml elements grouped by the number of previous BILLFIXED nodes `select max(case when col = 'BILLDATE' then val end) as BILLDATE, max(case when col = 'BILLREF' then val end) as BILLREF, max(case when col = 'BILLPARTY' then val end) as BILLPARTY, max(case when col = 'BILLOP' then val end) as BILLOP, max(case when col = 'BILLCL' then val end) as BILLCL, max(case when col = 'BILLDUE' then val end) as BILLPARTY, max(case when col = 'BILLOVERDUE' then val end) as BILLOVERDUE` – lptr Sep 01 '22 at 00:07
  • 1
    `from (select b.b.value('local-name(.)', 'nvarchar(200)') as col, b.b.value('text()[1]', 'nvarchar(200)') as val, sum(case when b.b.value('local-name(.)', 'nvarchar(200)') = 'BILLFIXED' then 1 else 0 end) over(order by b.b rows unbounded preceding) as grp from @xml.nodes('/ENVELOPE//*') as b(b) ) as x group by grp` – lptr Sep 01 '22 at 00:08

3 Answers3

2

I agree you should not use OPENXML, and instead use .nodes and .value.

Unfortunately, SQL Server does not allow the sibling:: axis in XQuery, which would have made this much easier.

You can do this purely using XQuery, by using the >> positional predicate.

SELECT
      c.value('(BILLDATE /text())[1]', 'VARCHAR(20)') AS BILLDATE
    , c.value('(BILLREF  /text())[1]', 'VARCHAR(20)') AS BILLREF
    , c.value('(BILLPARTY/text())[1]', 'VARCHAR(20)') AS BILLPARTY
    , c.value('let $i := . return (/ENVELOPE/BILLOP     [. >> $i]/text())[1]', 'INT') AS BILLOP
    , c.value('let $i := . return (/ENVELOPE/BILLCL     [. >> $i]/text())[1]', 'INT') AS BILLCL
    , c.value('let $i := . return (/ENVELOPE/BILLDUE    [. >> $i]/text())[1]', 'VARCHAR(20)') AS BILLDUE
    , c.value('let $i := . return (/ENVELOPE/BILLOVERDUE[. >> $i]/text())[1]', 'INT') AS BILLOVERDUE
FROM @xml.nodes('/ENVELOPE/BILLFIXED') AS t(c);

What this does is as follows:

  • Shred just the /ENVELOPE/BILLFIXED nodes.
  • For each of those, return the children as normal.
  • For each sibling, do the following steps:
    • Store the current node in $i.
    • Take the first child node matching the correct name, of the parent,...
    • ...where that node is positioned after $i, so [. >> $i]
    • Take the first child node's text [1]

db<>fiddle

If some of the nodes could be empty or missing, then you also need to check that the sibling node is before the next BILLFIXED node

SELECT
      c.value('(BILLDATE /text())[1]', 'VARCHAR(20)') AS BILLDATE
    , c.value('(BILLREF  /text())[1]', 'VARCHAR(20)') AS BILLREF
    , c.value('(BILLPARTY/text())[1]', 'VARCHAR(20)') AS BILLPARTY
    , c.value('let $i := . let $nxt := (/ENVELOPE/BILLFIXED[. >> $i])[1] return (/ENVELOPE/BILLOP     [. >> $i][. << $nxt]/text())[1]', 'INT') AS BILLOP
    , c.value('let $i := . let $nxt := (/ENVELOPE/BILLFIXED[. >> $i])[1] return (/ENVELOPE/BILLCL     [. >> $i][. << $nxt]/text())[1]', 'INT') AS BILLCL
    , c.value('let $i := . let $nxt := (/ENVELOPE/BILLFIXED[. >> $i])[1] return (/ENVELOPE/BILLDUE    [. >> $i][. << $nxt]/text())[1]', 'VARCHAR(20)') AS BILLDUE
    , c.value('let $i := . let $nxt := (/ENVELOPE/BILLFIXED[. >> $i])[1] return (/ENVELOPE/BILLOVERDUE[. >> $i][. << $nxt]/text())[1]', 'INT') AS BILLOVERDUE
FROM @xml.nodes('/ENVELOPE/BILLFIXED') AS t(c);

db<>fiddle


One final option is to transform the XML into a more normalized structure, and then query that

SELECT
      f.value('(BILLDATE   /text())[1]', 'VARCHAR(20)') AS BILLDATE
    , f.value('(BILLREF    /text())[1]', 'VARCHAR(20)') AS BILLREF
    , f.value('(BILLPARTY  /text())[1]', 'VARCHAR(20)') AS BILLPARTY
    , c.value('(BILLOP     /text())[1]', 'INT') AS BILLOP
    , c.value('(BILLCL     /text())[1]', 'INT') AS BILLCL
    , c.value('(BILLDUE    /text())[1]', 'VARCHAR(20)') AS BILLDUE
    , c.value('(BILLOVERDUE/text())[1]', 'INT') AS BILLOVERDUE
FROM (VALUES(
  @xml.query('
    for $bf in /ENVELOPE/BILLFIXED
    let $nxt := (/ENVELOPE/BILLFIXED[. >> $bf])[1]
    return
      <ENVELOPE>
        {$bf}
        {
            if ($nxt) then
                /ENVELOPE/*[. >> $bf][. << $nxt]
            else
                /ENVELOPE/*[. >> $bf]
        }
      </ENVELOPE>
  ')
) ) v(transformed)
CROSS APPLY v.transformed.nodes('/ENVELOPE') AS t(c)
CROSS APPLY t.c.nodes('BILLFIXED') t2(f);

db<>fiddle

Steps are as follows:

  • Use .query to create a new XML document.
  • For each BILLFIXED node, store that in $bf
  • Store the BILLFIXED node which follows $bf in $nxt
  • Return an ENVELOPE node, containing $bf as well as all nodes which...
    • ... are after $bf
    • ... and before $nxt if there is a $nxt
  • Query that result as normal, using .nodes
Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • appreaciate your answer & explanation + its faster – rajsx Sep 01 '22 at 06:02
  • Can you share any documentation or reference link that might help me to learn more about xquery? – rajsx Sep 01 '22 at 06:03
  • 1
    https://learn.microsoft.com/en-us/sql/xquery/xquery-language-reference-sql-server?view=sql-server-ver16 is a good start. I'm curious how much this option https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=1fe43bcfe99baab793583973ad0131d3 takes compared to mine and Yitzhak's versions – Charlieface Sep 01 '22 at 09:28
  • If any of BILLOP,BILLCL,BILLDUE,BILLOVERDUE is empty then the results might get skewed https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c1c904e7f17d20321d4a4c786284954a – lptr Sep 01 '22 at 10:21
  • True, you could fix that by comparing also to the next `BILLFIXED` node https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=eb22d9078a34554ab9e9965ac96a436c – Charlieface Sep 01 '22 at 10:31
  • The `not($nxt)` in the xpath filter is evaluated for every node and it hurts performance..one option would be to place the check in an if() to conditionally return the nodes after BILLFIXED.. https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=faeaff33e8ff86d7d622f4653577f78c – lptr Sep 01 '22 at 12:09
  • Great answer, +1 from my side! – Yitzhak Khabinsky Sep 02 '22 at 14:00
1

Microsoft proprietary OPENXML() and its companions sp_xml_preparedocument and sp_xml_removedocument are kept just for backward compatibility with the obsolete SQL Server 2000. Their use is diminished just to very few fringe cases. Starting from SQL Server 2005 onwards, it is strongly recommended to re-write your SQL and switch it to XQuery. Also, OPENXML() cannot take advantage of XML indexes while XQuery methods can.

Please try the following solution.

The XML sample is flattened.

So, we are using full power of XQuery.

SQL

DECLARE @xml XML = 
N'<ENVELOPE>
    <BILLFIXED>
        <BILLDATE>29-Jun-2019</BILLDATE>
        <BILLREF>123</BILLREF>
        <BILLPARTY>ABC</BILLPARTY>
    </BILLFIXED>
    <BILLOP>200</BILLOP>
    <BILLCL>200</BILLCL>
    <BILLDUE>29-Jun-2019</BILLDUE>
    <BILLOVERDUE>1116</BILLOVERDUE>
    <BILLFIXED>
        <BILLDATE>30-Jun-2019</BILLDATE>
        <BILLREF>April To June -19</BILLREF>
        <BILLPARTY>efg</BILLPARTY>
    </BILLFIXED>
    <BILLOP>100</BILLOP>
    <BILLCL>100</BILLCL>
    <BILLDUE>30-Jun-2019</BILLDUE>
    <BILLOVERDUE>1115</BILLOVERDUE>
</ENVELOPE>';

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Nmbr 
    , c.value('(./BILLDATE/text())[1]', 'VARCHAR(20)') AS BILLDATE
    , c.value('(./BILLREF/text())[1]', 'VARCHAR(20)') AS BILLREF
    , c.value('(./BILLPARTY/text())[1]', 'VARCHAR(20)') AS BILLPARTY
    , c.value('(/ENVELOPE/*[sql:column("seq.pos")]/text())[1]', 'INT') AS BILLOP
    , c.value('(/ENVELOPE/*[sql:column("seq.pos") + 1]/text())[1]', 'INT') AS BILLCL
    , c.value('(/ENVELOPE/*[sql:column("seq.pos") + 2]/text())[1]', 'VARCHAR(20)') AS BILLDUE
    , c.value('(/ENVELOPE/*[sql:column("seq.pos") + 3]/text())[1]', 'INT') AS BILLOVERDUE
    , seq.pos   -- just to see
FROM @xml.nodes('/ENVELOPE/BILLFIXED') AS t(c)
   CROSS APPLY (SELECT t.c.value('let $n := . return count(/ENVELOPE/*[. << $n[1]]) + 2','INT') AS pos
         ) AS seq;

Output

Nmbr BILLDATE BILLREF BILLPARTY BILLOP BILLCL BILLDUE BILLOVERDUE pos
1 29-Jun-2019 123 ABC 200 200 29-Jun-2019 1116 2
2 30-Jun-2019 April To June -19 efg 100 100 30-Jun-2019 1115 7
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • 1
    It is possible to do this purely in XQuery without a cross-join, hard to say which is faster. You definitely don't need `*`, that's going to be slow. Just use the normal node names, for example https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=1fe43bcfe99baab793583973ad0131d3 – Charlieface Aug 31 '22 at 20:14
  • 1
    @Charlieface, followed you advice, and removed `*`. Thanks. By the way, Query cost relative to the batch shows 50% for both versions. – Yitzhak Khabinsky Aug 31 '22 at 20:22
  • 1
    It would, the optimizer hasn't the faintest idea how much it actualy costs to query the XML. Need to time it properly – Charlieface Aug 31 '22 at 20:26
  • thanks for your suggestion & answer Can u share any references or documentation with examples to understand xquery better? – rajsx Sep 01 '22 at 06:04
1

Pivot the xml elements grouped by the number of preceding BILLFIXED nodes, fiddle.

declare @xml
xml= '<ENVELOPE>'+
replicate( cast('
    <BILLFIXED>
        <BILLDATE>29-Jun-2019</BILLDATE>
        <BILLREF>123</BILLREF>
        <BILLPARTY>ABC</BILLPARTY>
    </BILLFIXED>
    <BILLOP>200</BILLOP>
    <BILLCL>200</BILLCL>
    <BILLDUE>29-Jun-2019</BILLDUE>
    <BILLOVERDUE>1116</BILLOVERDUE>
    <BILLFIXED>
        <BILLDATE>30-Jun-2019</BILLDATE>
        <BILLREF>April To June -19</BILLREF>
        <BILLPARTY>efg</BILLPARTY>
    </BILLFIXED>
    <BILLOP>100</BILLOP>
    <BILLCL>100</BILLCL>
    <BILLDUE>30-Jun-2019</BILLDUE>
    <BILLOVERDUE>1115</BILLOVERDUE>' as nvarchar(max)), 100) +'
</ENVELOPE>';


select 
  max(case when col = 'BILLDATE' then val end) as BILLDATE,
  max(case when col = 'BILLREF' then val end) as BILLREF,
  max(case when col = 'BILLPARTY' then val end) as BILLPARTY,
  max(case when col = 'BILLOP' then val end) as BILLOP,
  max(case when col = 'BILLCL' then val end) as BILLCL,
  max(case when col = 'BILLDUE' then val end) as BILLPARTY,
  max(case when col = 'BILLOVERDUE' then val end) as BILLOVERDUE
from
(
select 
  b.b.value('local-name(.)', 'nvarchar(200)') as col, b.b.value('text()[1]', 'nvarchar(200)') as val,
  sum(case when b.b.value('local-name(.)', 'nvarchar(200)') = 'BILLFIXED' then 1 else 0 end) 
  over(order by b.b rows unbounded preceding) as grp
from @xml.nodes('/ENVELOPE//*') as b(b)
) as x
group by grp;
lptr
  • 1
  • 2
  • 6
  • 16