0

I am working with a database that pulls data in from an application. The data appears as XML in the field, but the datatype is actually varchar2. How do I split the data into columns by the names in the brackets? So end result would be each name in <> would be the column header and the value between the closing and ending brackets would be the row data.

Example:

Column Name: VAL_TXT

xml:<objects.CollegeInfo>
  <collegeOrgId>0000000000</collegeOrgId>
  <useCollegeOrgId>0000000000</useCollegeOrgId>
  <collegeName>University [City, ST, USA]</collegeName>
  <collegeCountry>USA</collegeCountry>
  <collegeState>ST</collegeState>
  <fromDate>08/15/2004</fromDate>
  <toDate>05/15/2007</toDate>
  <degreeDate>08/15/2008</degreeDate>
  <gpaType>4PT</gpaType>
  <gradePointAverage>3.5</gradePointAverage>
  <enteredDegree>Bachelor&apos;s</enteredDegree>
</objects.CollegeInfo>

Each row may have different fields in them as well.

Also, this data is stored in an Oracle server, but I am pulling the data in through linked server on MS SQL.

Thanks!

EDIT: I did try the following

SELECT CAST(a.VAL_TXT as xml).value('(/objects.CollegeInfo/collegeOrgId)[1]','INT') AS col1
from TABLE a;

I received the error: Msg 9403, Level 16, State 1, Line 1 XML parsing: line 0, character 0, unrecognized input signature

New EDIT (Second try):

SELECT *
FROM
    (
       SELECT
          VAL_TXT = CAST(VAL_TXT AS XML)
       FROM
          TABLE
    ) t
    CROSS APPLY
    (
       SELECT
          collegeOrgId = x.n.value('collegeOrgId[1]','varchar(20)')
          ,useCollegeOrgId = x.n.value('useCollegeOrgId[1]','varchar(50)')
          ,collegeName = x.n.value('collegeName[1]','VARCHAR(100)')
          ,collegeCountry = x.n.value('collegeCountry[1]','VARCHAR(20)')
          ,collegeState = x.n.value('collegeState[1]','CHAR(2)')
          ,fromDate = x.n.value('fromDate[1]','varchar(30)')
          ,toDate = x.n.value('toDate[1]','varchar(30)')
          ,degreeDate = x.n.value('degreeDate[1]','varchar(30)')
          ,gpaType = x.n.value('gpaType[1]','VARCHAR(20)')
          ,gradePointAverage = x.n.value('gradePointAverage[1]','DECIMAL(5,3)')
          ,enteredDegree = x.n.value('enteredDegree[1]','VARCHAR(20)')
       FROM
          t.VAL_TXT.nodes ('objects.CollegeInfo') as x(n)
    ) c;

Receives error: Msg 9411, Level 16, State 1, Line 8 XML parsing: line 1, character 10, semicolon expected

1 Answers1

0
DECLARE @TableWithXMLAsVarchar AS TABLE (XMLasVarchar VARCHAR(MAX))
INSERT INTO @TableWithXMLAsVarchar VALUES ('<objects.CollegeInfo>
  <collegeOrgId>0000000000</collegeOrgId>
  <useCollegeOrgId>0000000000</useCollegeOrgId>
  <collegeName>University [City, ST, USA]</collegeName>
  <collegeCountry>USA</collegeCountry>
  <collegeState>ST</collegeState>
  <fromDate>08/15/2004</fromDate>
  <toDate>05/15/2007</toDate>
  <degreeDate>08/15/2008</degreeDate>
  <gpaType>4PT</gpaType>
  <gradePointAverage>3.5</gradePointAverage>
  <enteredDegree>Bachelor&apos;s</enteredDegree>
</objects.CollegeInfo>')


SELECT *
FROM
    (
       SELECT
          XMLColumn = CAST(XMLasVarchar AS XML)
       FROM
          @TableWithXMLAsVarchar
    ) t
    CROSS APPLY
    (
       SELECT
          collegeOrgId = x.n.value('collegeOrgId[1]','INT')
          ,useCollegeOrgId = x.n.value('useCollegeOrgId[1]','INT')
          ,collegeName = x.n.value('collegeName[1]','VARCHAR(100)')
          ,collegeCountry = x.n.value('collegeCountry[1]','VARCHAR(20)')
          ,collegeState = x.n.value('collegeState[1]','CHAR(2)')
          ,fromDate = x.n.value('fromDate[1]','DATE')
          ,toDate = x.n.value('toDate[1]','DATE')
          ,degreeDate = x.n.value('degreeDate[1]','DATE')
          ,gpaType = x.n.value('gpaType[1]','VARCHAR(20)')
          ,gradePointAverage = x.n.value('gradePointAverage[1]','DECIMAL(5,3)')
          ,enteredDegree = x.n.value('enteredDegree[1]','VARCHAR(20)')
       FROM
          t.XMLColumn.nodes ('objects.CollegeInfo') as x(n)
    ) c

Basically you kind of had the right idea but instead of selecting the value directly you have to select the value from the nodes. But because this data is in a table you actually have a little more to do. First you will have to cast the column as XML but you can't do that in the FROM statement so you can do it via a Common Table Expression [CTE] or an aliased Derived Table. After that you can CROSS/OUTER APPLY to get all of the columns.

Matt
  • 13,833
  • 2
  • 16
  • 28
  • I altered date data types to avoid some current conversion errors. After changing those I receive the following error: Msg 9411, Level 16, State 1, Line 8 XML parsing: line 1, character 10, semicolon expected – Lindsey Jackson Nov 17 '17 at 20:33
  • @LindseyJackson Yeah you will have to change Datatypes to match your data just wanted to show you the options so you could figure that out. As far as semicolon expected I would have to see your code to know what the syntax difference would be. If you edit your question and append as current try or something I would be glad to take a look. Also I guess the question is where are you trying to run this? This is assuming you are running it in MS SQL. here is a link you can test/play with to show this code functioning as intended: http://rextester.com/RQV19377 – Matt Nov 17 '17 at 20:38
  • I edited post for second try using your help. I am running it in MS SQL. I've obviously edited the table name for TABLE for security reasons. Same for a couple fields. – Lindsey Jackson Nov 20 '17 at 15:02
  • @LindseyJackson I copied your code as pasted and no issue with it. Perhaps it is because you are running query against linked server to ORACLE? If so I am not sure how that would be treated differently unfortunately I have no way of testing that. you could pull the raw data to a temp table on the ms sql-server and then query from there... Or perform the operation on the oracle server and then pull the results I am sure the xml methods would be different on Oracle though.... – Matt Nov 20 '17 at 17:53