Fastest method to import large XML into SQL-Server table

Question

I have a really large and not to pretty XML which I want to import in my sql-server database. The format of the XML is like I said really ugly:

<myxml xmlns="http://somenamespace.whatever.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <mydata>
        <item>
            <record>some</record>
            <record>123</record>
            <record xs:nil="true" />
            <record>random</record>
            <record>234</record>    
        </item>
        <item>
            <record>345</record>
            <record>in all</record>
            <record>these</record>
            <record>cells</record>
            <record>123asdf</record>            
        </item>
        <item>
            <record>how</record>
            <record>to</record>
            <record>import</record>
            <record>987654321</record>
            <record xs:nil="true" />
        </item>
    </mydata>
</myxml>

This is just a small sample. In fact there the XML is over a 100ML, it has over 200k on items and each item has 15 records but this sample will do just fine.

I know what each 'record' in the 'item' represents but for me its enough to import all record-values to column with varchar(100). Lets say this table:"

CREATE TABLE [dbo].[DataFromXml](
    [Column1] [varchar](100) NULL,
    [Column2] [varchar](100) NULL,
    [Column3] [varchar](100) NULL,
    [Column4] [varchar](100) NULL,
    [Column5] [varchar](100) NULL
) ON [PRIMARY]
GO

I'm able to get this done with this code:

CREATE TABLE XmlTable
(
    XMLData XML
)

INSERT INTO XmlTable(XMLData)
SELECT CONVERT(XML, BulkColumn) 
FROM OPENROWSET(BULK 'D:\myverylarge.xml', SINGLE_CLOB) AS x;

DECLARE @XML AS XML
SELECT @XML=XMLData FROM XmlTable

;WITH XMLNAMESPACES ('http://www.w3.org/2001/XMLSchema-instance' as xs, DEFAULT 'http://somenamespace.whatever.com/schemas/xmldata/1/')
INSERT INTO DataFromXml(Column1, Column2, Column3, Column4, Column5)
SELECT  ref.value('record[1][not(@xs:nil = "true")]' ,'varchar(100)') as Column1
        ,ref.value('record[2][not(@xs:nil = "true")]' ,'varchar(100)') as Column2
        ,ref.value('record[3][not(@xs:nil = "true")]' ,'varchar(100)') as Column3
        ,ref.value('record[4][not(@xs:nil = "true")]' ,'varchar(100)') as Column4
        ,ref.value('record[5][not(@xs:nil = "true")]' ,'varchar(100)') as Column5
        FROM @XML.nodes('/myxml/mydata/item') xmlData( ref )

This runs for a minute or 2 which might not be so bad. I dont have a good reference. My feeling is that this could be a lot faster since getting the XML (over 100MB) into the database using the OPENROWSET just takes seconds.

Can I optimize the insert and if so, how would I do that?

"since getting the XML (over 100MB) into the database using the OPENROWSET just takes seconds." - sure? or" reading the file without any parsing" takes a few seconds. YOu do a REALLY large amount of processing there, which OPENROWSET does imho not do. Like not at all. — TomTom, Jul 10 '20 at 14:34

Shnugo · Accepted Answer · 2020-07-10T21:34:28.830

Dealing with NULL values is something special in XML.

The definition of a NULL value in XML is not existing. So

<a>
    <b>hi</b>
    <c></c>
    <d/>
</a>

<a> is the root elelement.
<b> is an element with a text() node.
<c> is an empty element
<d> is a self-closing element
<e> is - uhm - not there...

Important hint: <c> and <d> are the same, absolutely no difference!

You can query for the element with

.value('(/a/b)[1]','nvarchar(100)')

And you can query for the the text() node specifically

.value('(/a/b/text())[1]','nvarchar(100)')

In this you find a possible answer (a bit hidden): You can do all your code without the NULL checking predicate, if you query for the text() node specifically.

Change this

ref.value('record[1][not(@xs:nil = "true")]' ,'varchar(100)')

to this

ref.value('(record[1]/text())[1]' ,'varchar(100)')

What might break this: If a <record>'s content may be an empty string you would get a NULL back and not ''. But it should be much faster... Hope, this is okay for you...

About performance: Read this answer. It covers your issue quite well. Especially the part where the time is consumed (follow the links in this answer).

Yitzhak Khabinsky · Answer 2 · 2020-07-10T15:13:45.347

2

Just to complement @Shnugo answer.

All credit goes to him.

This is your exact SQL statement. It should give you around 20% performance improvement. Please give it a shot.

;WITH XMLNAMESPACES ('http://www.w3.org/2001/XMLSchema-instance' as xs, DEFAULT 'http://somenamespace.whatever.com/schemas/xmldata/1/')
INSERT INTO DataFromXml(Column1, Column2, Column3, Column4, Column5)
SELECT  ref.value('(record[1]/text())[1]' ,'varchar(100)') as Column1
        ,ref.value('(record[2]/text())[1]' ,'varchar(100)') as Column2
        ,ref.value('(record[3]/text())[1]' ,'varchar(100)') as Column3
        ,ref.value('(record[4]/text())[1]' ,'varchar(100)') as Column4
        ,ref.value('(record[5]/text())[1]' ,'varchar(100)') as Column5
FROM @XML.nodes('/myxml/mydata/item') xmlData(ref);

edited Jul 10 '20 at 15:13

answered Jul 10 '20 at 15:05

Yitzhak Khabinsky

18,471
2
15
21

Thx, Yitzhak, my credit goes to you, +1 from my side :-) – Shnugo Jul 10 '20 at 15:24
Thank you both @Shnugo and Yitzhak. I will give this a try for sure. And how about the INSERT INTO, can we somehow replace that with more efficient statements? – Stackberg Jul 10 '20 at 18:03
@Stackberg, there is nothing wrong with the `INSERT` statement. – Yitzhak Khabinsky Jul 10 '20 at 18:24

Fastest method to import large XML into SQL-Server table

2 Answers2