2

I have XML similar to the following in a SQL 2008 database, stored in an XML field. I would like to return a true or false indication if a node exists in a specific section of the XML.

<root>
    <node attribute1='value1' attribute2='value2'>
        <sub1 name='ID' value="1" />
        <sub2 name='project' value="abc" />
        <sub3 name='Lead' value="John" />
    </node>
    <entry attribute1='value1' attribute2='value2'>
        <message>start</message>
    </entry>
    <entry attribute1='value1' attribute2='value2'>
        <attribute name='project' value='done'>
    </entry>    
    <node attribute1='value1'>
        <sub1 name='ID' value="2" />
        <sub2 name='project' value="abc" />
        <sub3 name='Lead' value="John" />
    </node>
    <entry attribute1='value1' attribute2='value2'>
        <message>start</message>
    </entry>
    <node attribute1='value1'>
        <sub1 name='ID' value="3" />
        <sub2 name='project' value="abc" />
        <sub3 name='Lead' value="John" />
    </node>
    <entry attribute1='value1' attribute2='value2'>
        <message>start</message>
    </entry>
    <node attribute1='value1'>
        <sub1 name='ID' value="4" />
        <sub2 name='project' value="abc" />
        <sub3 name='Lead' value="John" />
    </node> 
    <entry attribute1='value1' attribute2='value2'>
        <message>start</message>
    </entry>
    <entry attribute1='value1' attribute2='value2'>
        <attribute name='project' value='done'>
    </entry>
</root>

As you'll notice, the <attribute> node may or may not occur after a node with 'ID'. In this example, you can see it in the first and fourth "sections" for lack of a better term.

With the following table structure:

ID (PK)
EventID (FK)
RawXML (XML)
Created (datetime)

Here is an extract of the SQL/xQuery that I have so far:

WITH XMLNAMESPACES(
  'http://www.w3.org/2001/XMLSchema-instance' as xsi,
),
t1 as(
    SELECT distinct
      x.EventId
    , c.value ('(//node/sub[@name=''ID'']/@value)[1]', 'nvarchar(max)') as ID   
    , c.value ('(//node/sub[@name=''ID''][1][descendant::attribute/@name=''project''])[1]', 'nvarchar(max)' ) as Exists     
    FROM
      Table1 x
    CROSS APPLY
      RawXML.nodes('./.') as t(c)
     )
select distinct
  t1.ID
, t1.Exists
from t1

I will be running the script 4 or more times (incrementing all of the singleton values before each run)

For the XML given, I need to end up with the following results after running the query 4 times: (the values of the IDs will not be know so I can't use them in the query)

    ID    Exists
   ----   -------
    1      true
    2      false
    3      false
    4      true

With the SQL given, I didn't get any errors but it's taking forever (well over 45 minutes) and I haven't even let it finish yet. It really shouldn't take this long to parse the XML.

UPDATE: I limited my query to make sure it was only parsing one row (one XML file) and it finished in 57 seconds. However, I got a result of '0' for ID 1 and ID 2 when I should have had a '1' for ID 1.

And I'm sure most of you are aware that following-sibling, etc isn't supported by SQL Server so unfortunately that's not an option.

Just for reference, I've used this successfully to find the two instances of 'Project' but it ignores where in the xml they occur.:

c.value ('(//node[descendant::attribute/@name=''Project''])[1]', 'nvarchar(max)' ) as TrueFalse

So basically, I need to know if the node with name='Project' exists after a node with name='ID' BUT before the next instance of a node with name='ID'

N1tr0
  • 485
  • 2
  • 6
  • 24

1 Answers1

2

You have some errors in your XML and judging by the query you use I also changed the sub nodes.

You can enumerate your ID and project nodes using row_number() and then check if the "next row" is a project node or a an ID row using regular SQL instead of XQuery.

-- Temp table to hold the extracted values from the XML
create table #C
(
  rn int primary key,
  ID int
);

-- Get the enumerated rows with ID.
-- project nodes will have NULL in ID
insert into #C
  select row_number() over(order by T.N) as rn,
         T.N.value('sub[@name = "ID"][1]/@value', 'int') as ID
  from table1
    cross apply RawXML.nodes('/root/*[sub/@name = "ID" or attribute/@name = "project"]') as T(N)

-- Get the ID's and check if the next row is a project node
select C1.ID,
       case when exists (
                          select *
                          from #C as C2
                          where C1.rn + 1 = C2.rn and 
                          C2.ID is null
                        ) 
         then 1
         else 0
       end as [Exists]
from #C as C1
where C1.ID is not null;

drop table #C;

SQL Fiddle

You can do it without a temp table using a CTE instead but I suspect that the temp table version will be faster.

with C as
(
  select row_number() over(order by T.N) as rn,
          T.N.value('sub[@name = "ID"][1]/@value', 'int') as ID
  from table1
    cross apply RawXML.nodes('/root/*[sub/@name = "ID" or attribute/@name = "project"]') as T(N)
)
select C1.ID,
        case when exists (
                          select * 
                          from C as C2 
                          where C1.rn + 1 = C2.rn and 
                                C2.ID is null
                        ) 
          then 1
          else 0
        end as [Exists]
from C as C1
where C1.ID is not null;

SQL Fiddle

Mikael Eriksson
  • 136,425
  • 22
  • 210
  • 281
  • Thanks Mikael. I will try out your solution to see if I can get it to work. I quickly typed out the XML so there definitely could be an error in there. I can't post my actual XML due to sensitive information. Thanks for the tip on the Temp table as well. I hope that will help with the run times. – N1tr0 Oct 24 '12 at 11:41
  • Well I've tried to get it to work with XML but no luck so far. Right now I'm just trying to see what is going to be inserted into the temp table but I get the 'singleton' error below on the row pulling the Value(), even though I am specifying the first instance [1]. Here's the full error: XQuery [Table1.RawXML.value()]: 'value()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *' – N1tr0 Oct 24 '12 at 12:53
  • OK, I can get it to return results if I move the [1] designation to the @value side like so: `T.N.value('sub[@name = "ID"]/@value[1]', 'int') as ID`. I also had to change the 'cross apply' to this: `cross apply RawXML.nodes('/root/*sub[@name = "ID" or attribute/@name = "project"]')`. The rows are ordered however every row has an ID value associated with it, instead of the NULL for the 'project' rows. – N1tr0 Oct 24 '12 at 13:06
  • @N1tr0 Then your XML does not look like what you have posted. You got to have `attribute` as a child item to `sub` then? If you post something that looks like the XML you have I can have a look at it and see how the query should be modified. – Mikael Eriksson Oct 24 '12 at 13:16
  • OK, I had to play around a bit with my SQL since my source XML is much more complicated than the sample but I got it to work. Thanks Mikael! You rock! – N1tr0 Oct 24 '12 at 19:55
  • @N1tr0 I'm glad to hear you got it working and that you could use the answer. – Mikael Eriksson Oct 24 '12 at 20:01