6

I don't know if it has already been answered, but my bad luck or so I can't find it anywhere in stackoverflow with my hunting techniques. Kindly ignore my spamming

We have a requirement where we need to write a API parser that works for any API giving XML output.

We will not know the XML structure before hand.

The solution should convert XML file and save it in a generic tsql table with the XML element/attribute names as first row.

So basically it is XML deserializer for any API.

We cannot use any third party dll for our C# class.

I have no idea about C# so dont know if it is possible or not. But I have been able to write a genric XML->row converter in tsql using OPENXML. The problem with tsql solution is that we are unable to import a huge XML file succesfully into database.

I can supply any details that will be required. Kindly let me know in comments/answers.

I don't wish anyone to write a code for me, any suitable pointers would be enough

Resources: JSON

[
{
        "id" : 21953,
        "mainReqIdentity" : "xxxx",
        "itemName" : "xxxx",
        "kanbanPhase" : "xxxx",
        "kanbanStatus" : "xxxx",
        "backlogItemType" : "xxxx",
        "identityDomain" : "xxxx",
        "fromDatetime" : "2016-08-05 17:52:34",
        "teams" : [],
        "releases" : [{
                "id" : 1229,
                "release_name" : "xxxx",
                "release_connection_type" : "xxxx"
            }
        ],
        "fpReleases" : [],
        "sources" : [{
                "sourceName" : "xxxx",
                "sourceRecordUrl" : "xxxx",
                "sourceRecordIdentity" : "xxxx"
            }
        ],
        "productNumbers" : [],
        "tags" : [],
        "productComponents" : [],
        "ranPlatforms" : [],
        "subReleases" : [],
        "requirementAreaId" : xxxx,
        "requirementArea" : "xxxx",
        "toBeHandledAtxxxx" : "xxxx"
    }, {
        "id" : 22014,
        "mainReqIdentity" : "xxxx",
        "itemName" : "xxxx",
        "kanbanPhase" : "xxxx",
        "kanbanStatus" : "xxxx",
        "backlogItemType" : "xxxx",
        "identityDomain" : "xxxx",
        "fromDatetime" : "2016-08-05 17:52:34",
        "teams" : [],
        "releases" : [{
                "id" : xxxx,
                "release_name" : "xxxx",
                "release_connection_type" : "xxxx"
            }
        ],
        "fpReleases" : [],
        "sources" : [{
                "sourceName" : "xxxx",
                "sourceRecordUrl" : "xxxx",
                "sourceRecordIdentity" : "xxxx"
            }
        ],
        "productNumbers" : [],
        "tags" : [],
        "productComponents" : [],
        "ranPlatforms" : [],
        "subReleases" : [],
        "requirementAreaId" : xxxx,
        "requirementArea" : "xxxx",
        "f0Date" : "2015-10-01",
        "f1Date" : "2015-10-01",
        "f2Date" : "2016-02-01",
        "f4Date" : "2016-03-31",
        "fgDate" : "2016-04-29",
        "toBeHandledAtxxxx" : "xxxx"
    }
    ]

XML: 2 samples

Sample 1

    <root type="array">
    <id type="number">21286</id>
    <mainReqIdentity type="string">xxxxxx</mainReqIdentity>
    <itemName type="string">xxxxxx</itemName>
    <kanbanPhase type="string">xxxxxx</kanbanPhase>
    <kanbanStatus type="string">xxxxxx</kanbanStatus>
    <kanbanNote type="string">xxxxxx</kanbanNote>
    <backlogItemType type="string">xxxxxx</backlogItemType>
    <identityDomain type="string">xxxxxx</identityDomain>
    <fromDatetime type="string">2016-08-23 17:01:52</fromDatetime>
    <teams type="array">
      <item type="object">
        <team_name type="string">xxxxxx</team_name>
        <preliminary type="boolean">xxxxxx</preliminary>
      </item>
    </teams>
    <releases type="array">
      <item type="object">
        <id type="number">xxxxxx</id>
        <release_name type="string">xxxxxx</release_name>
        <release_connection_type type="string">xxxxxx</release_connection_type>
      </item>
    </releases>
    <fpReleases type="array">
    </fpReleases>
    <sources type="array">
      <item type="object">
        <sourceName type="string">xxxxxx</sourceName>
        <sourceRecordUrl type="string">xxxxxx</sourceRecordUrl>
      </item>
    </sources>
    <productNumbers type="array">
    </productNumbers>
    <tags type="array">
    </tags>
    <productComponents type="array">
    </productComponents>
    <ranPlatforms type="array">
    </ranPlatforms>
    <subReleases type="array">
    </subReleases>
    <requirementAreaId type="number">xxxxxx</requirementAreaId>
    <requirementArea type="string">xxxxxx</requirementArea>
    <itemContact type="string">xxxxxx</itemContact>
    <toBeHandledAtxxx type="string">xxxxxx</toBeHandledAtLuca>
  </item>
    <item type="object">
    <id type="number">xxxxxx</id>
    <mainReqIdentity type="string">xxxxxx</mainReqIdentity>
    <itemName type="string">xxxxxx</itemName>
    <kanbanPhase type="string">xxxxxx</kanbanPhase>
    <kanbanStatus type="string">xxxxxx</kanbanStatus>
    <kanbanNote type="string">xxxxxx</kanbanNote>
    <backlogItemType type="string">xxxxxx</backlogItemType>
    <identityDomain type="string">xxxxxx</identityDomain>
    <fromDatetime type="string">2016-08-23 17:01:52</fromDatetime>
    <teams type="array">
      <item type="object">
        <team_name type="string">xxxxxx</team_name>
        <preliminary type="boolean">xxxxxx</preliminary>
      </item>
    </teams>
    <releases type="array">
      <item type="object">
        <id type="number">xxxxxx</id>
        <release_name type="string">xxxxxx</release_name>
        <release_connection_type type="string">xxxxxx</release_connection_type>
      </item>
    </releases>
    <fpReleases type="array">
    </fpReleases>
    <sources type="array">
      <item type="object">
        <sourceName type="string">xxxxxx</sourceName>
        <sourceRecordUrl type="string">xxxxxx</sourceRecordUrl>
      </item>
    </sources>
    <productNumbers type="array">
    </productNumbers>
    <tags type="array">
    </tags>
    <productComponents type="array">
    </productComponents>
    <ranPlatforms type="array">
    </ranPlatforms>
    <subReleases type="array">
    </subReleases>
    <requirementAreaId type="number">xxxxxx</requirementAreaId>
    <requirementArea type="string">xxxxxx</requirementArea>
    <oaResultReference type="string">xxxxxx</oaResultReference>
    <itemContact type="string">xxxxxx</itemContact>
    <f0Date type="string">2014-10-17</f0Date>
    <f1Date type="string">2015-01-16</f1Date>
    <f2Date type="string">2015-02-13</f2Date>
    <f4Date type="string">2015-06-12</f4Date>
    <faDate type="string">2015-06-12</faDate>
    <fgDate type="string">2015-06-12</fgDate>
    <toBeHandledAtxxx type="string">xxxxxx</toBeHandledAtLuca>
  </item>
 </root>

Sample 2

<ROOT>  
<Customer CustomerID="VINET" ContactName="Paul Henriot">  
   <Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">  
      <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>  
      <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>  
   </Order>  
</Customer>  
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">  
   <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">  
      <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>  
   </Order>  
</Customer>  
</ROOT>

SQL

Generic staging Table

create table ZZZZZZZZZ
(
api_id int,
record_type char(1),
record_id INT,
last_run_time datetime,
last_run_by varchar(500),
col1 VARCHAR(500),
col2 VARCHAR(500),
col3 VARCHAR(500),
col4 VARCHAR(500),
col5 VARCHAR(500),
col6 VARCHAR(500),
col7 VARCHAR(500),
col8 VARCHAR(500),
col9 VARCHAR(500),
col10 VARCHAR(500),
col11 VARCHAR(500),
col12 VARCHAR(500),
col13 VARCHAR(500),
col14 VARCHAR(500),
col15 VARCHAR(500),
col16 VARCHAR(500),
col17 VARCHAR(500),
col18 VARCHAR(500),
col19 VARCHAR(500),
col20 VARCHAR(500),
col21 VARCHAR(500),
col22 VARCHAR(500),
col23 VARCHAR(500),
col24 VARCHAR(500),
col25 VARCHAR(500),
col26 VARCHAR(500),
col27 VARCHAR(500),
col28 VARCHAR(500),
col29 VARCHAR(500),
col30 VARCHAR(500),
col31 VARCHAR(500),
col32 VARCHAR(500),
col33 VARCHAR(500),
col34 VARCHAR(500),
col35 VARCHAR(500),
col36 VARCHAR(500),
col37 VARCHAR(500),
col38 VARCHAR(500),
col39 VARCHAR(500),
col40 VARCHAR(500),
col41 VARCHAR(500),
col42 VARCHAR(500),
col43 VARCHAR(500),
col44 VARCHAR(500),
col45 VARCHAR(500),
col46 VARCHAR(500),
col47 VARCHAR(500),
col48 VARCHAR(500),
col49 VARCHAR(500),
col50 VARCHAR(500),
col51 VARCHAR(500),
col52 VARCHAR(500),
col53 VARCHAR(500),
col54 VARCHAR(500),
col55 VARCHAR(500),
col56 VARCHAR(500),
col57 VARCHAR(500),
col58 VARCHAR(500),
col59 VARCHAR(500),
col60 VARCHAR(500),
col61 VARCHAR(500),
col62 VARCHAR(500),
col63 VARCHAR(500),
col64 VARCHAR(500),
col65 VARCHAR(500),
col66 VARCHAR(500),
col67 VARCHAR(500),
col68 VARCHAR(500),
col69 VARCHAR(500),
col70 VARCHAR(500),
col71 VARCHAR(500),
col72 VARCHAR(500),
col73 VARCHAR(500),
col74 VARCHAR(500),
col75 VARCHAR(500),
col76 VARCHAR(500),
col77 VARCHAR(500),
col78 VARCHAR(500),
col79 VARCHAR(500),
col80 VARCHAR(500),
col81 VARCHAR(500),
col82 VARCHAR(500),
col83 VARCHAR(500),
col84 VARCHAR(500),
col85 VARCHAR(500),
col86 VARCHAR(500),
col87 VARCHAR(500),
col88 VARCHAR(500),
col89 VARCHAR(500),
col90 VARCHAR(500),
col91 VARCHAR(500),
col92 VARCHAR(500),
col93 VARCHAR(500),
col94 VARCHAR(500),
col95 VARCHAR(500),
col96 VARCHAR(500),
col97 VARCHAR(500),
col98 VARCHAR(500),
col99 VARCHAR(500),
col100 VARCHAR(500),
col101 VARCHAR(500),
col102 VARCHAR(500),
col103 VARCHAR(500),
col104 VARCHAR(500),
col105 VARCHAR(500),
col106 VARCHAR(500),
col107 VARCHAR(500),
col108 VARCHAR(500),
col109 VARCHAR(500),
col110 VARCHAR(500),
col111 VARCHAR(500),
col112 VARCHAR(500),
col113 VARCHAR(500),
col114 VARCHAR(500),
col115 VARCHAR(500),
col116 VARCHAR(500),
col117 VARCHAR(500),
col118 VARCHAR(500),
col119 VARCHAR(500),
col120 VARCHAR(500),
col121 VARCHAR(500),
col122 VARCHAR(500),
col123 VARCHAR(500),
col124 VARCHAR(500),
col125 VARCHAR(500),
col126 VARCHAR(500),
col127 VARCHAR(500),
col128 VARCHAR(500),
col129 VARCHAR(500),
col130 VARCHAR(500),
col131 VARCHAR(500),
col132 VARCHAR(500),
col133 VARCHAR(500),
col134 VARCHAR(500),
col135 VARCHAR(500),
col136 VARCHAR(500),
col137 VARCHAR(500),
col138 VARCHAR(500),
col139 VARCHAR(500),
col140 VARCHAR(500),
col141 VARCHAR(500),
col142 VARCHAR(500),
col143 VARCHAR(500),
col144 VARCHAR(500),
col145 VARCHAR(500),
col146 VARCHAR(500),
col147 VARCHAR(500),
col148 VARCHAR(500),
col149 VARCHAR(500),
col150 VARCHAR(500)
)

Sample Output

enter image description here

Generic XML Parser written in TSQL. There are few hacks in the code and some stray code that need to remove. This works fine. But the trouble is in sending the entire XML document as input parameter from C# code through either direct call or through a file.

CREATE PROC ZZZZZZZ
(
@in_api_id int,
@in_xml_doc XML,
@in_xml_root varchar(100),
@in_tot_result_col int = 150,
@in_need_colnm_result CHAR(1) = 'Y',
@in_debug_flg CHAR(1) = 'N'
)
AS
BEGIN
DECLARE 
    @idoc int, 
    @sqlstr nvarchar(max) = '',
    @param nvarchar(200) = '',
    @runtime datetime = getdate(),
    @runby varchar(30) = suser_name(),
    @cnt int,
    @pre_stg_col_nm varchar(max) = '',
    @max_lvl int,
    @max_node varchar(500)='',
    @max_node_wo_slash varchar(500)='',
    @xml_col nvarchar(max) = '',
    @unq_col nvarchar(max) = '',
    @unq_xml_col nvarchar(max)=''

--Create an internal representation of the XML document.  
EXEC sp_xml_preparedocument @idoc OUTPUT, @in_xml_doc;  
-- Execute a SELECT statement that uses the OPENXML rowset provider.  
set @in_xml_root = concat('/',@in_xml_root)
SELECT * into #tmp FROM OPENXML (@idoc, @in_xml_root,2) where id <> 0;

--select * from #tmp_xml_nodes
--select * from #tmp
--select * from #tmp_pre_staging

;with xml_cte(id, parentid, nodetype, localname, prefix, namespaceuri, datatype, prev, text, lvl,node,parent_localname)
AS
(
select  id, 
                parentid, 
                nodetype, 
                localname, 
                prefix, 
                namespaceuri, 
                datatype, 
                prev, 
                text,
                1 as lvl,
                cast(CONCAT(@in_xml_root,'/',localname) as varchar(100)) node,
                cast('' as varchar(200))
from #tmp
where parentid = 0
UNION all
select  t.id, 
                t.parentid, 
                t.nodetype, 
                t.localname, 
                t.prefix, 
                t.namespaceuri, 
                t.datatype, 
                t.prev, 
                t.text,
                iif(t.nodetype = 1,xc.lvl+1,xc.lvl),
                cast(
                        CONCAT (
                                        xc.node
                                        ,iif(t.nodetype = 1, 
                                                CONCAT (
                                                            '/'
                                                            ,t.localname
                                                            )
                                                ,''
                                                )
                                        ) AS VARCHAR(100)
                            ),
                cast(xc.localname as varchar(200))
from #tmp t
inner join xml_cte xc
on xc.id = t.parentid
)
select * into #xmlcte from xml_cte

--select * from #xmlcte
--v2 change
select @max_lvl = max(lvl)--iif(max(lvl)>=4,1,0) -- the iif condition is just a hack, I dont know why it works
from #xmlcte 

select 
    @max_node = concat(max(node),'/'),
    @max_node_wo_slash = max(node) 
from #xmlcte 
where lvl = @max_lvl

select *,concat(parent_localname,'_',localname,' varchar(500)') fnl_col_nm,
                                case 
                                when lvl<@max_lvl then concat(replicate('../',@max_lvl-lvl+iif(nodetype=1,nodetype,0)),iif(nodetype=1,'','@'),localname) --v2 change
                                when lvl>@max_lvl then concat(replace(node,@max_node,''),iif(nodetype=1,'','/@'),localname)--v2 change
                                else concat('../',iif(nodetype=1,'',concat(parent_localname,'/@')),localname)--v2 change
                                end col_Struct
        ,concat(parent_localname,'_',localname) col_unq_nm
        ,ROW_NUMBER() over (order by(select 100)) sno
        ,concat('xmlname.value(''/Names[1]/name[',ROW_NUMBER() over (order by(select 100)),']'',''varchar(500)'') AS ',concat(parent_localname,'_',localname)) col_splt_nm
into #xml_col_struct
from #xmlcte
where nodetype <= 2--v2 change

--select * from #xml_col_struct
set @cnt = (select count(distinct col_unq_nm) from #xml_col_struct)

select @pre_stg_col_nm =
(
select concat(',',COLUMN_NAME)
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'ZZZZZZ'
and COLUMN_NAME like 'col%'
and ORDINAL_POSITION <= @cnt+5
order by ORDINAL_POSITION
for xml path('')
)

set @sqlstr = concat(
                                        'insert into ZZZZZ(api_id,record_type,record_id,last_run_time,last_run_by',
                                        @pre_stg_col_nm,
                                        ')'
                                        )
select @xml_col =
(
select distinct concat(',',fnl_col_nm,' ''',col_Struct,'''',char(10)) 
from #xml_col_struct
order by 1
for xml path('')
)
set @xml_col = stuff(@xml_col,1,1,'')

select @unq_col =
(
select distinct concat(',',col_unq_nm )
from #xml_col_struct
order by 1
for xml path('')
)
set @unq_col = stuff(@unq_col,1,1,'')

select @in_tot_result_col = @in_tot_result_col - count(distinct col_unq_nm)
from #xml_col_struct

select @unq_xml_col =
(
select 
concat(',xmlname.value(''/Names[1]/name[',ROW_NUMBER() over (order by(select 100)),']'',''varchar(500)'') AS ',col_unq_nm,char(10))
from (select distinct col_unq_nm from #xml_col_struct) t
for xml path('')
)
set @unq_xml_col = stuff(@unq_xml_col,1,1,'')

set @sqlstr =
                        concat(
                                    iif(@in_need_colnm_result = 'Y',
                                    concat('
                                                ;WITH Split_Names (xmlname)
                                                AS
                                                (
                                                        SELECT 
                                                        CONVERT(XML,''<Names><name>''  
                                                        + REPLACE(''',@unq_col,''','','', ''</name><name>'') + ''</name></Names>'') AS xmlname
                                                )
                                                '
                                                --,@sqlstr
                                                ,char(10),
                                                ' SELECT ',@in_api_id,',''H'',0,''',@runtime,''',''',@runby,''',',char(10)
                                                ,@unq_xml_col,replicate(',NULL',@in_tot_result_col)--v2 change
                                                ,char(10)
                                                ,'FROM Split_Names'
                                                ,char(10)
                                                ,'union all'
                                                )
                                        ,''
                                        )
                                    --,iif(@in_need_colnm_result = 'Y','',@sqlstr)
                                    ,'
                                    SELECT ',@in_api_id,',''D'',ROW_NUMBER() over (order by(select 100)),''',@runtime,''',''',@runby,''',*'  
                                    ,replicate(',NULL',@in_tot_result_col)--v2 change
                                    ,char(10)
                                    ,'FROM   OPENXML (@idoc_inn, ''',@max_node_wo_slash,''',2)'   
                                    ,char(10)
                                    ,'WITH (',@xml_col,')'
                                    )

if @in_debug_flg = 'Y'
    begin
        select @max_lvl+1,@max_lvl,@max_node_wo_slash,@xml_col,@unq_col,@sqlstr,@unq_xml_col
        select * from #xml_col_struct--v2 change
    end
else
    begin
        set @param = '@idoc_inn int'
        exec sys.sp_executesql @sqlstr,@param,@idoc_inn = @idoc
    end
EXEC sp_xml_removedocument @idoc

END

SQL code to read XML file loaded by C# class. This also works fine, but the trouble is all lines are in separate rows and concatenation truncates after a point

create table #tmp(data_line nvarchar(max))

bulk insert #tmp
FROM '\\Server\\ZZZZ\\Downloads\\Data.xml'  
   WITH   
      ( 
                --firstrow = 1          
                 ROWTERMINATOR ='\n'  
      );  

select * from #tmp

C# class

Object httpConn = Dts.Connections["HTTP"].AcquireConnection(null);
        HttpClientConnection myConnection = new HttpClientConnection(httpConn);
        myConnection.ServerURL = string.Format(("http://xxxx.com/jjjj"),"userid","password");
        byte[] webdata = myConnection.DownloadData();

        String result_data = Convert.ToBase64String(webdata);
        XmlDocument xd = new XmlDocument();
        XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader(webdata, XmlDictionaryReaderQuotas.Max);

        xr.Read();
        xd.LoadXml(xr.ReadOuterXml());
        xd.Save("\\Server\\ZZZZ\\Downloads\\Data.xml"); 
  • 1
    What you may be hitting here are some of the inherent limitations with SQL Server. Columns per non-wide table 1024, columns per wide table 30000. If you are using non-wide tables and the xml has in excess of 1024 elements per record the table creation will fail. – Kevin Oct 20 '16 at 13:30
  • 1
    This we are sure that we won't be seeing more than 100 elements out attributes in any API – Jayesh Prakash Oct 20 '16 at 13:39
  • 1
    Ok then, but that was not what you stated in your question. You said any XML. – Kevin Oct 20 '16 at 13:48
  • 1
    Sorry my bad... – Jayesh Prakash Oct 20 '16 at 13:49
  • 1
    The main point is here: *in a generic tsql table*. Actually it is only the **a**: Is the base structure easy (something like key-value pairs) or might this be deeply nested with *1:n-related* data? Is the XML *one big thing* or might it carry many records? Do you need the structure *row-wise* (read about key-value-pairs), or do you need wide tables with named columns. Please provide some examples of (reduced) XML what they look like and how you want it to be stored. – Shnugo Oct 20 '16 at 15:04
  • @Shnugo I have added more details. Please see above. – Jayesh Prakash Oct 21 '16 at 06:02
  • Might be helpfull to poste an example of your JSON too. As far as I see the whole misery comes from your attempts to transfer JSON to XML and this to SQL Server in order to fill a very strange generic table. [Read about the XY-Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)... Tell us the start and the expected goal and let the community offer you an approach... Might be, that your input was to much and made us blind for a better solution... – Shnugo Oct 24 '16 at 12:12
  • @Shnugo I have posted the JSON format as well – Jayesh Prakash Oct 25 '16 at 07:04

4 Answers4

4

If you are getting a properly formatted XML. You can use Dataset. Specifically use Dataset.ReadXml(). This will load your xml in Dataset object independent of xml tags. Then you can use ADO.Net, linq2sql, EF or any other communication method to put it in database.

Since you are saving file on server you can use below code:

DataSet ds = new DataSet();
ds.ReadXml("\\Server\\ZZZZ\\Downloads\\Data.xml"); 

then you can iterate over each table of dataset using foreach loop. The attributes in the xml will become your columns in the datatable.

So your final code will be something similar to this:

using (DataSet ds = new DataSet())
{
     ds.ReadXml("\\Server\\ZZZZ\\Downloads\\Data.xml");
     int nTableCounts = ds.Tables.Count;
     foreach(DataTable dt in ds.Tables)
     {
          using (dt)
          {
            //Put data in SQL table.
          }
     }
}

Let me know if anything is unclear.

Pratik Gaikwad
  • 1,526
  • 2
  • 21
  • 44
  • 1
    Thanks pratik will try and wait for a sample code from you – Jayesh Prakash Oct 22 '16 at 13:43
  • 1
    I am not too good in c# so want to know if methods referred by you are not third party – Jayesh Prakash Oct 22 '16 at 14:24
  • 1
    @JayeshPrakash No. It's built in .net functionality. You will have to use **"System.Data"** assembly if it's not referred in your project – Pratik Gaikwad Oct 22 '16 at 14:25
  • 1
    And do we need to create structure of datatable before hand or will that be created dynamically with elements as column name? – Jayesh Prakash Oct 22 '16 at 14:29
  • @JayeshPrakash It will be created Dynamically. – Pratik Gaikwad Oct 22 '16 at 14:31
  • Sorry for spamming, one last question before i dive in, will every datable be created for every xml hierarchical data from root onwards? – Jayesh Prakash Oct 22 '16 at 14:45
  • @JayeshPrakash yes. Basically there will be a separate table for each XML node such that attributes of individual node will be columns of the corresponding individual table. So in case of your sample 2(explaining it since it's smaller) there will be 3 tables: customer, order, order details. And attributes of customer node will be columns of customer table. – Pratik Gaikwad Oct 22 '16 at 14:49
  • @JayeshPrakash Just following up... any issues? Is the solution working properly? – Pratik Gaikwad Oct 23 '16 at 14:31
  • Thanks for following up. Yes I am trying, it worked like a charm for sample 2 xml. However with sample 1, it is erroring out **System.ArgumentException: The table (item) cannot be the child table to itself in nested relations.** because is getting created for every level when converting from json to xml. Is there a possibility of using a similiar system function for json instead of xml? Or to NOT generate for every level – Jayesh Prakash Oct 24 '16 at 07:38
  • Is there a possibility of combining all tables into a bigger denormalized table? – Jayesh Prakash Oct 25 '16 at 07:07
  • I need to check. I didn't get time yesterday but I will check today and let you know. – Pratik Gaikwad Oct 25 '16 at 11:46
4

Thank you for placing so many details. Now - together with the bounty - people rush in for help.

This is not really an answer, at least it's not a solution for your actual issue! Rather some thoughts / hints:

To be honest...

This should be done completely differently!

Your current approach is awfull...

  • The way you read the XML line-wise
  • The way you shredd it with FROM OPENXML
  • Huge amounts of string based code...
  • The senseless storage of untyped string values
  • A multi-hundred-column-staging-table with meaningless column names...

I doubt, that this concept will ever work with any xml response...

Validity

Your sample 1 is not even valid XML.

  • missing tag <item>
  • wrong closing tag <toBeHandledAtxxx type="string">xxxxxx</toBeHandledAtLuca>
  • Not the proper datetime format ISO8601... The given format will be working, but is a sign, that this XML is not created the best way...

If this is your actual response, any XML based approach will never work. I hope this happened in order to reduce the lines...

Hierarchy

In sample 1 there can be any number of items below <root>. There are 9 !!! areas of potential 1:n data (more than one team, more than one release...).

in relational concepts you'd have to spread this over several related tables

Your sample 2 is quite plain, but has got a 3-level-hierarchy too.

How do you want to put this in your multi-hundred-columns-table? Side-by-side??? Tons of rows due to a cartesian product???

Naming

Your sample 2 might be fairly easy to read into a plain structure. You can use local-name() to know which is a Customer and which is an OrderDetail.

But in your sample 2 all of them are called <item>...

Structure

Your sample 2 is plain attribute centered XML while your sample 1 mixes attributes and element centered data.

Import

One day there comes the moment, where you want to shuffle the data from your staging table into the fitting structures. Reading from your crude multi-hundred-column-table means, that you have to know everything about the original XML. You must work with complex mapping tables in order to know, what the value of col107 is (semantic, type, format...). For this process you will need a very specific routine for each single kind of XML

Reading the XML-file

At the bottom you show how you are reading the XML file from SQL Server

You follow a line-wise approach, which let you state: but the trouble is all lines are in separate rows. Better try it like this, it will read the whole XML in one single go:

DECLARE @x XML=
(
    SELECT x 
    FROM OPENROWSET(BULK 'C:\YourPath\YourXMLFile.xml', SINGLE_BLOB) AS YourFile(x)
);
select @x;

My advise

Especially the last point (very specific routine for each single kind of XML) cries for an import of your XML as-is. There is - at least in my eyes - absolutely no value in your attempts to get any XML into one huge plain table. Better use a slim staging table with some meta columns and one single column for the full XML (might be as NVARCHAR(MAX) to keep it more tolerant against invalid XMLs). There you fill in the XML exactly as you've read it.

Anyway you will need a specific reading process for each kind of XML. Create this directly for the XML.

Do not use FROM OPENXML This is outdated... In your case this still might be a possible way, but - if performance does not matter - you should better use the function you find in John Capelettis answer. This will recursively traverse down litterally any XML and provide the information in a table - without the need to dive into XPath and XQuery.

Best was to define a specific reading process for each kind of XML based on the modern XML methods .nodes(),.query() and .value(). This will be clean, fast and maintainable.

Come back with new question(s) if you need help with this!

A last reason to convince you

Just imagine, your XML sender adds one element somewhere in the middle. In your table the same content moves from col108 to col109. Do you really want to go into versioning of your reading process? If you read the data directly from XML the added element would be just added to the reading process. Older XML without this value can be read with the same engine without troubles...

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • 1
    Thanks a ton for you comments. AT times, no matter what we feel for a design we have to adhere to them as constraint by the clients. About you comment - "But in your sample 2 all of them are called ..." The Item is iterated @ every level because of conversion from JSON to XML in C# class here `XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader(webdata, XmlDictionaryReaderQuotas.Max);` – Jayesh Prakash Oct 24 '16 at 07:08
  • 1
    @JayeshPrakash Did you try the few lines to read a XML-file into a variable in one single call? If this works for you, try this with the function linked by John Capeletti. And I'm curious: What happens, if the recieved XML changes (internal order, new elements...)? How are you storing cartesian 1:n data? I'd silently add a XML column and go my approach and just tell the client, that everything is taken from this *crazy-generic-mega-moster-table* :-) – Shnugo Oct 24 '16 at 07:31
  • 1
    Yes I did, the real XML file which is 10MB size with relational size of 6000 records is still running (started more than an hour ago). Another issue is how to stop generating for every level in C# though this line of code `XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader(webdata, XmlDictionaryReaderQuotas.Max);` – Jayesh Prakash Oct 24 '16 at 07:35
  • 1
    There's something wrong... 10MB is not that huge... Try the approach with a tiny (but valid!) test file. This should work much faster... – Shnugo Oct 24 '16 at 07:45
  • 1
    Small file - sample 2 worked out quickly. A bit bigger file that had 100 records worked fine too. I am suspecting that level that is added before every level may be causing a cross join. I am checking on it. If somehow I can prevent that from getting generated then life will be easir for 2 solutions. Thanks for staying with me. – Jayesh Prakash Oct 24 '16 at 07:50
  • @JayeshPrakash Your sentence *Another issue is how to stop generating for every level in C#* and the last comment let me ask: Why? Just to transcode JSON? Would it be possible to upgrade to SQL Server 2016? There was native JSON-support... No need to do any XML stuff... – Shnugo Oct 24 '16 at 07:57
  • No it's not on roadmap of IT division yet – Jayesh Prakash Oct 24 '16 at 08:03
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/126580/discussion-between-jayesh-prakash-and-shnugo). – Jayesh Prakash Oct 25 '16 at 07:20
3

Consider the following

I use a helper function (shamelessly lifted from http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx ... with a couple of tweaks i.e. range keys) to convert virtually any XML into a hierarchy.

I should note that I opted for Temp Tables rather than a series of CTEs just for performance and convenience. I'm sure if you want, you could easily migrate into a CTE approach.

Regarding performance, we're looking at 90 - 110ms for the sample files provided. I can't speak to how well this would perform with a LARGE XML source.

The SQL

--Drop Table #TempBase;Drop Table #TempCols;Drop Table #TempHier;Drop Table #TempPivot

-- Declare @Vars
Declare @in_api_id int = 1
Declare @runby varchar(30)='Some User'
Declare @XML xml ='<Root><Customer CustomerID="VINET" ContactName="Paul Henriot"><Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00"><OrderDetail OrderID="10248" ProductID="11" Quantity="12"/><OrderDetail OrderID="10248" ProductID="42" Quantity="10"/></Order></Customer><Customer CustomerID="LILAS" ContactName="Carlos Gonzlez"><Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00"><OrderDetail OrderID="10283" ProductID="72" Quantity="3"/></Order></Customer></Root>'
--Declare @XML xml='<catalog><product description="Cardigan Sweater" product_image="cardigan.jpg"><catalog_item gender="Men''s"><item_number>QWZ5671</item_number><price>39.95</price><size description="Medium"><color_swatch image="red_cardigan.jpg">Red</color_swatch><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch></size><size description="Large"><color_swatch image="red_cardigan.jpg">Red</color_swatch><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch></size></catalog_item><catalog_item gender="Women''s"><item_number>RRX9856</item_number><price>42.50</price><size description="Small"><color_swatch image="red_cardigan.jpg">Red</color_swatch><color_swatch image="navy_cardigan.jpg">Navy</color_swatch><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch></size><size description="Medium"><color_swatch image="red_cardigan.jpg">Red</color_swatch><color_swatch image="navy_cardigan.jpg">Navy</color_swatch><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch><color_swatch image="black_cardigan.jpg">Black</color_swatch></size><size description="Large"><color_swatch image="navy_cardigan.jpg">Navy</color_swatch><color_swatch image="black_cardigan.jpg">Black</color_swatch></size><size description="Extra Large"><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch><color_swatch image="black_cardigan.jpg">Black</color_swatch></size></catalog_item></product></catalog>'

--Select * From [dbo].[udf-XML-Hier](@XML) Order by R1

-- Generate Base Hier Data from XML
Select * Into #TempBase From [dbo].[udf-XML-Hier](@XML) Order by R1

-- Generate Required Columns with Sequence
Select *,ColSeq   = Row_Number() over (Order by MinR1),ColName  = concat('col',Row_Number() over (Order by MinR1) ),ColTitle = IIF(Attribute='','_','')+Element+IIF(Attribute='','','_'+Attribute)
 Into  #TempCols
 From (
        Select Element,Attribute,MinR1 = Min(R1)
        From   #TempBase
        Where  R1>1 
        Group  By Element,Attribute
      ) A

-- Extend Base Data with Col Seq
Select A.*,C.ColSeq,RowSeq = 1+Sum(RowFlg) over (Order By R1)
Into   #TempHier
From  (Select *,RowFlg =IIF(Lag(Lvl,1) over (Order By R1)>Lvl,1,0) From #TempBase) A
Join  #TempCols C on (A.Element=C.Element and A.Attribute=C.Attribute)

-- Generate Data to be Pivoted and Augment for Inheritance
Select RowSeq=0,ColSeq,ColName,Value = cast(ColTitle as varchar(max))
Into   #TempPivot
From   #TempCols
Union  All
Select A.RowSeq,A.ColSeq,A.ColName,Value = IsNull(B.Value,'')
From  (
        Select A.*,R1 = case when B.R1 is not null then B.R1 else (Select Max(R1) from #TempHier Where ColSeq=A.ColSeq and RowSeq<=A.RowSeq) end
         From (
                Select A.*,B.*
                 From (Select Distinct RowSeq From #TempHier) A
                 Join (Select * From #TempCols) B on (1=1) 
              ) A
         Left Join #TempHier B on (A.RowSeq=B.RowSeq and A.ColSeq=B.ColSeq )
      ) A
 Join  #TempHier B on (A.R1=B.R1)

-- Build and Execute the Final Select
Declare @SQL varchar(max) = ''
Select @SQL = @SQL+concat(',',ColName,'=max(case when ColSeq=',ColSeq,' then Value else null end)') from #TempCols Order by ColSeq
Select @SQL = '
Select api_id        = '+cast(@in_api_id as varchar(25))+' 
      ,record_type   = max(case when RowSeq=0 then ''H'' else ''D'' end)
      ,record_id     = RowSeq
      ,last_run_time = GetDate()
      ,last_run_by   ='''+@runby+''''
      +@SQL+'
 From  #TempPivot
 Group By RowSeq
 Order By RowSeq
'
Exec(@SQL)

Returns

enter image description here

Performance

I know this is a small sample, but the results are returned between 80 and 160ms.


The Table-Valued Function (if needed)

CREATE FUNCTION [dbo].[udf-XML-Hier](@XML xml)
Returns Table 
As Return
with  cte0 as ( 
                  Select Lvl       = 1
                        ,ID        = Cast(1 as int) 
                        ,Pt        = Cast(NULL as int)
                        ,Element   = x.value('local-name(.)','varchar(150)')
                        ,Attribute = cast('' as varchar(150))
                        ,Value     = x.value('text()[1]','varchar(max)')
                        ,XPath     = cast(concat(x.value('local-name(.)','varchar(max)'),'[' ,cast(Row_Number() Over(Order By (Select 1)) as int),']') as varchar(max))
                        ,Seq       = cast(10000001 as varchar(max))
                        ,AttData   = x.query('.') 
                        ,XMLData   = x.query('*') 
                  From   @XML.nodes('/*') a(x) 
                  Union  All
                  Select Lvl       = p.Lvl + 1 
                        ,ID        = Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10
                        ,Pt        = p.ID
                        ,Element   = c.value('local-name(.)','varchar(150)')
                        ,Attribute = cast('' as varchar(150))
                        ,Value     = cast( c.value('text()[1]','varchar(max)') as varchar(max) ) 
                        ,XPath     = cast(concat(p.XPath,'/',c.value('local-name(.)','varchar(max)'),'[',cast(Row_Number() Over(PARTITION BY c.value('local-name(.)','varchar(max)') Order By (Select 1)) as int),']') as varchar(max) )
                        ,Seq       = cast(concat(p.Seq,' ',10000000+Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10) as varchar(max))
                        ,AttData   = c.query('.') 
                        ,XMLData   = c.query('*') 
                  From   cte0 p 
                  Cross  Apply p.XMLData.nodes('*') b(c) 
              )
    , cte1 as (   
                  Select R1 = Row_Number() over (Order By Seq),A.*
                  From  (
                          Select  Lvl,ID,Pt,Element,Attribute,Value,XPath,Seq From cte0
                          Union All
                          Select Lvl       = p.Lvl+1
                                ,ID        = p.ID + Row_Number() over (Order By (Select NULL)) 
                                ,Pt        = p.ID
                                ,Element   = p.Element
                                ,Attribute = x.value('local-name(.)','varchar(150)')
                                ,Value     = x.value('.','varchar(max)')
                                ,XPath     = p.XPath + '/@' + x.value('local-name(.)','varchar(max)')
                                ,Seq       = cast(concat(p.Seq,' ',10000000+p.ID + Row_Number() over (Order By (Select NULL)) ) as varchar(max))
                          From   cte0 p 
                          Cross  Apply AttData.nodes('/*/@*') a(x) 
                        ) A 
               )

Select A.R1
      ,R2  = IsNull((Select max(R1) From cte1 Where Seq Like A.Seq+'%'),A.R1)
      ,A.Lvl
      ,A.ID
      ,A.Pt
      ,A.Element
      ,A.Attribute
      ,A.XPath
      ,Title = Replicate('|---',Lvl-1)+Element+IIF(Attribute='','','@'+Attribute)
      ,A.Value
 From  cte1 A

/*
Source: http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx

Declare @XML xml='<person><firstname preferred="Annie" nickname="BeBe">Annabelle</firstname><lastname>Smith</lastname></person>'
Select * from [dbo].[udf-XML-Hier](@XML) Order by R1
*/

Edit - Just for fun, I grabbed an XML file from http://www.service-architecture.com/articles/object-oriented-databases/xml_file_for_complex_data.html , and the results are as follows:

enter image description here

John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
  • This function is your best friend I bet :-) I was just thinking about placing it myself. Good luck anyway... – Shnugo Oct 23 '16 at 19:24
  • @Shnugo I just like re-usable code. I want to serve many masters without having to re-invent the wheel every time. – John Cappelletti Oct 23 '16 at 19:33
  • @Shnugo The TVF is indeed the workhorse here. OP had some odd formatting requirements, thus all the secondary code. – John Cappelletti Oct 23 '16 at 19:41
  • Shredding an unknown XML can be done *row-by-row* only... The linked function is a great tool to get each value with meta-data like the full path. But I think one should not support the OP's wish to store all data *side-by-side*... Anyway +1 from my side – Shnugo Oct 23 '16 at 23:23
  • @Shnugo Just read your post (twice) and I do agree with you. Honestly, I see little to no value in this structure. That said, I've done things in the past which were met with resistance and eventually proved to be value-added. My grandfather always told me to "judge slowly". Besides, I dig the puzzles. – John Cappelletti Oct 23 '16 at 23:35
  • @JohnCappelletti Sorry I cannot post the real production xml file due to data security. But it is in line with Sample 1. The XML file contains 6000 records with varying level of depths for each top level node. For ex: one node will have data till , the other node will have data all the way till . Your helper code has been runnign for more than 90 minutes now. – Jayesh Prakash Oct 24 '16 at 07:55
  • @JayeshPrakash Clearly something is wrong there. 6,000 records is not huge. I just tested a 16,501 row file in 20 seconds . 18 seconds for the initial parse and the remaining 2 seconds for the final formatting. (on my laptop) – John Cappelletti Oct 24 '16 at 13:57
  • @JayeshPrakash Do you NOT have a SINGLE top/root node? – John Cappelletti Oct 24 '16 at 13:59
2

Why use the SQL Server to deserialize the file. This is something that's better left to c# to do.

I haven't done this in C#, but you can deserialize the XML into a List of Strings, and iterate through the list to build a record to insert into the database. Here's a generic deserialization method that might help.

I would think it would be faster, and not piss off your DBA's trying to do heavy string manipulation in the server.

Community
  • 1
  • 1
Mike
  • 3,186
  • 3
  • 26
  • 32
  • Hi Mike thanks for a pointer, i am not too good with c# so pardon my misunderstanding of there is.. If i read this solution correctly, one still need to define the structure of the XML in some class which means i will have you define structure of all XML i need to de serialize – Jayesh Prakash Oct 22 '16 at 13:40
  • Not the best example, but you should be able to pass it as a String. – Mike Oct 22 '16 at 13:44
  • The above poster may already have a better solution, but you also should be able to iterate through using the XmlDictionaryReader as well. – Mike Oct 22 '16 at 13:49
  • I agree this, should using c# get value from xml then insert into db. If you want to deserialize xml string or file, you can try this [tool](https://github.com/631320085/XmlObject) i wrote, it's very simple to use. you can copy soure code(only few class) to your project if your company not allow thrid-party dll. – Bucketcode Oct 29 '16 at 05:47