1

I have XML data stored in a SQL Server 2016 database that I need to query a date range in a stored procedure around the PROJECT_END to return the application_ID's

<PROJECTS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <row>
    <APPLICATION_ID>3012448</APPLICATION_ID>
    <ACTIVITY>D23</ACTIVITY>
    <ADMINISTERING_IC>NU</ADMINISTERING_IC>
    <APPLICATION_TYPE>1</APPLICATION_TYPE>
    <ARRA_FUNDED xsi:nil="true" />
    <BUDGET_START>01/01/1985</BUDGET_START>
    <BUDGET_END>12/31/1985</BUDGET_END>
    <FOA_NUMBER xsi:nil="true" />
    <PROJECT_START>01/01/1985</PROJECT_START>
    <PROJECT_END>12/31/1987</PROJECT_END>
  </row>
</PROJECTS> 

I have the select part but can't seem to query based upon the date range using PROJECT_END. Also would BETWEEN work?

SELECT 
    nref.value('APPLICATION_ID[1]', 'Int') APPLICATION_ID                         
FROM 
    STAGING
CROSS APPLY
    XMLData.nodes('/PROJECTS/row') AS r(nref)
WHERE
    nref.value('PROJECT_END[1]', 'varchar(max)') > '1/1/1987' 
    AND nref.value('PROJECT_END[1]', 'varchar(max)') < '1/1/1990' 

I have no control over the formatting of the date fields, unless I can convert it on the fly. Any help is appreciated.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
Bill
  • 1,423
  • 2
  • 27
  • 51

2 Answers2

2

If you have a date value - why don't you treat it like that? Don't convert this value to a nvarchar(max) - it's a date, so convert it to DATE instead! Doing that, your query should work just fine.

Also, I'd suggest to always use the adapted ISO-8601 format for specifying dates as string literals - e.g. the format YYYYMMDD (no slashes, nothing) - this works for all language/date format settings:

SELECT 
    nref.value('APPLICATION_ID[1]', 'INT') APPLICATION_ID                         
FROM 
    STAGING
CROSS APPLY
    XMLData.nodes('/PROJECTS/row') AS r(nref)
WHERE
    nref.value('PROJECT_END[1]', 'DATE') > '19870101' 
    AND nref.value('PROJECT_END[1]', 'DATE') < '19900101' 
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    The OP states: *I have no control over the formatting of the date fields* – Shnugo Jun 30 '17 at 07:45
  • The usage of `value('..','DATE')` relys on the system's culture settings. If this is other the `DATEFORMAT MDY` this conversion will break... – Shnugo Jun 30 '17 at 08:04
0

Try it like this:

SELECT 
    nref.value('APPLICATION_ID[1]', 'Int') APPLICATION_ID                         
FROM 
    STAGING
CROSS APPLY
    XMLData.nodes('/PROJECTS/row') AS r(nref)
WHERE
    CONVERT(DATE,nref.value('(PROJECT_END/text())[1]', 'nvarchar(max)'),101) >= {d'1987-01-01'} 
    AND CONVERT(DATE,nref.value('(PROJECT_END/text())[1]', 'nvarchar(max)'),101) < {d'1990-01-01'};

Some background:

Your date format is mm/dd/yyyy according to this documentation you need the format code 101 to parse it correctly.

Within XML one should really never use culture dependant date/time formats! So the clear advise was: Change the XML and use ISO8601! But - as you state in your query - this is not under your control...

The reason, why your alphanumerical comparisson did not work properly was probably related to > '1/1/1987' which should have been > '01/01/1987'. Even if this might work, you should try to remain a typed valued in the appropriate type.

Stating a date literally has several options. I prefer the ODBC syntax, other people will prefer '19870101' (unseparated)...

update

In your query I assume, that the original XML consists of several <row> nodes (due to your call to .nodes('/PROJECTS/row'). If the dateformat was specified correctly, you could add an XQuery predicate to this. The result was a derived table with the wanted <row> nodes only. But you have to rely on the system's culture setting, which is evil. Therefore your query has to read the whole lot into a derived table, just to filter some of them out (calling expensive parse operations)... Double penalty...

Community
  • 1
  • 1
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • The original/original xml consists of tens of thousands of rows per year that when you add up all of the years it is close to 3 million. I have parse each years worth of xml data and have stored it as a single row in the database. After I have pulled out the rows innerxml I added back the project and row nodes before saving it to the database. Would it have been better if I just started with the APPLICATION_ID node? or keep the row node and omit the PROJECTS node? – Bill Jun 30 '17 at 11:55
  • And yes I don't have any control over the date format and depending on the hosting (right now mine, but not always) I don't have any control over the systems culture settings. – Bill Jun 30 '17 at 11:55
  • @user1314159 Do you have a really good reason, to keep this lot of data within XML? Why don't you store your data in a normal table? XML is not meant to store such a big amount of **actively used data**. – Shnugo Jul 02 '17 at 21:10