1

I have a XML file stored in a XML datatype column data in my table records.

The table looks like this:

create table records 
(
     id int,
     type nvarchar(28),
     data xml,
     posted datetime
)

XML data:

<Properties>
    <data>
        <Name>novel</Name>
        <Gender>Female</Gender>
        <Age>32</Age>
        <Salary>55k</Salary>
        <Phone>123-123</Phone>
    </data>
</Properties>

I am currently using following query to extract data from that XML column which is taking more than minutes in 20K records.

select
    id,
    posteddate,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Name)') ) AS Name,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Gender)') ) AS Gender,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Age)') ) AS Age,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Salary)') ) AS Salary,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Phone)') ) AS Phone
from 
    records
where 
    type = 'personnel_xml'

Can anybody help explain how can I optimize this scenario as I need to extract 100 such elements from my XML stored as a column.

Dale K
  • 25,246
  • 15
  • 42
  • 71
OmGanesh
  • 952
  • 1
  • 12
  • 24

2 Answers2

4

Assuming you have multiple <data> within the XML. Notice I added an expanded XML file which will have two sets.

Declare @table table (id int,data xml)
Insert Into @table values (1,'<Properties><data><Name>novel</Name><Gender>Female</Gender><Age>32</Age><Salary>55k</Salary><Phone>123-123</Phone></data>
<data><Name>Another Name</Name><Gender>Male</Gender><Age>45</Age><Salary>75k</Salary><Phone>555-1212</Phone></data>
</Properties>')

;with cte as (
      Select ID
            ,RN   = Row_Number() over (Partition By ID Order By (Select Null))
            ,Data = m.query('.') 
      From   @table AS t
      Cross Apply t.Data.nodes('/Properties/data') AS A(m)
 )
Select ID
      ,RN
      ,Name   = Data.value('(data/Name)[1]'  ,'nvarchar(500)')
      ,Gender = Data.value('(data/Gender)[1]','nvarchar(500)')
      ,Age    = Data.value('(data/Age)[1]'   ,'nvarchar(500)')
      ,Salary = Data.value('(data/Salary)[1]','nvarchar(500)')
      ,Phone  = Data.value('(data/Phone)[1]' ,'nvarchar(500)')
 From  cte

Returns

ID  RN  Name            Gender  Age     Salary  Phone
1   1   novel           Female  32      55k     123-123
1   2   Another Name    Male    45      75k     555-1212
Dale K
  • 25,246
  • 15
  • 42
  • 71
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
  • http://stackoverflow.com/users/1570000/john-cappelletti my xml has single data element so in that case i hope i dont need cross apply. So, from the perspective of performance, which one should be choosen, value() or query() ? – OmGanesh Dec 15 '16 at 03:21
  • @JohnCappelleti what is the difference in performance in 'storing the whole xml in #temp table and extracting from #temp' OR 'directly extracting from table.column without using #temp' for my scenario of 200s of elements without repeating attribute – OmGanesh Dec 15 '16 at 03:59
  • @LearnByExample Correct, since you have single data element there is no need for the cross apply. and since you know the path value would be faster than query. – John Cappelletti Dec 15 '16 at 10:12
  • @LearnByExample Regarding your #temp. Hard to say. One test is worth a thousand expert opinions :) – John Cappelletti Dec 15 '16 at 10:14
3

To get a value out of XML in SQL Server you should use the value() Method (xml Data Type). And for untyped XML you should specify the text() node to get better performance.

select R.id,
       R.posted,
       R.data.value('(/Properties/data/Name/text())[1]', 'nvarchar(500)') as Name,
       R.data.value('(/Properties/data/Gender/text())[1]', 'nvarchar(10)') as Gender,
       R.data.value('(/Properties/data/Age/text())[1]', 'int') as Age,
       R.data.value('(/Properties/data/Salary/text())[1]', 'nvarchar(10)') as Salary,
       R.data.value('(/Properties/data/Phone/text())[1]', 'nvarchar(30)') as Phone
from dbo.records as R
where type = N'personnel_xml';
Mikael Eriksson
  • 136,425
  • 22
  • 210
  • 281
  • 1
    Hi, do you know, if the engine is smart enough to find, that `(/Properties/data/` is the same in all cases? I thought it was faster either to use something like John Cappelletti with `.query()` in a CTE, or - even better - `CROSS APPLY` on `.nodes(N'/Properties/data')` and then use `.value()` on `(Name/text())[1]`. Won't it navigate through the whole `XPath` over and over in this case? – Shnugo Dec 06 '16 at 07:53
  • @Shnugo Hi, I have done some performance testing regarding using `cross apply nodes` to get a shorter expression in the values function and that is slower (not much). Using `query()` is not good at all since the `UDX` operator that creates the new XML is slowing things down. Not sure if this holds true for *all* shapes and forms of XML but it does for all I have tried. – Mikael Eriksson Dec 06 '16 at 08:01
  • @Shnugo BTW, the entire XML is available for the value function even if you use `cross apply nodes`. What you have is something called the "context node" which basically is a hierarchical ID that is used as a parameter to the value function. Apparently it is not much faster to find nodes using that instead of a path expression in the internal representation of the XML. At least not so much faster that it compensate for the cost of the extra function call because of the `cross apply nodes`. – Mikael Eriksson Dec 06 '16 at 08:03
  • Thank you for the quick response! I had thought, that navigating with a fix starting point (the *context node*) would be faster... This might depend on the depth and complexity... When I've got some extra time, I'll test this out. – Shnugo Dec 06 '16 at 08:05
  • @Shnugo my only focus is on performance. xml is pretty simple but only have huge number of elements (almost 200 elements) separated on 4 heading elements like 50 <> elements...50 <> elements and the number of table rows is above 100K . Can you recommend some tools that i could use to measure/test the running time. – OmGanesh Dec 15 '16 at 03:56
  • @MikaelEriksson Do you have any opinion whether to use current node or context node? The depth of my xml is only 4 levels (at max). – OmGanesh Dec 15 '16 at 04:09
  • I tried all the way but still not getting good perfromance. Tried creating xml index on the column but it reduces the performance. Any other thoughts? – Dharmendra Kumar Jun 03 '22 at 14:27