I have a really large and not to pretty XML which I want to import in my sql-server database. The format of the XML is like I said really ugly:
<myxml xmlns="http://somenamespace.whatever.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<mydata>
<item>
<record>some</record>
<record>123</record>
<record xs:nil="true" />
<record>random</record>
<record>234</record>
</item>
<item>
<record>345</record>
<record>in all</record>
<record>these</record>
<record>cells</record>
<record>123asdf</record>
</item>
<item>
<record>how</record>
<record>to</record>
<record>import</record>
<record>987654321</record>
<record xs:nil="true" />
</item>
</mydata>
</myxml>
This is just a small sample. In fact there the XML is over a 100ML, it has over 200k on items and each item has 15 records but this sample will do just fine.
I know what each 'record' in the 'item' represents but for me its enough to import all record-values to column with varchar(100). Lets say this table:"
CREATE TABLE [dbo].[DataFromXml](
[Column1] [varchar](100) NULL,
[Column2] [varchar](100) NULL,
[Column3] [varchar](100) NULL,
[Column4] [varchar](100) NULL,
[Column5] [varchar](100) NULL
) ON [PRIMARY]
GO
I'm able to get this done with this code:
CREATE TABLE XmlTable
(
XMLData XML
)
INSERT INTO XmlTable(XMLData)
SELECT CONVERT(XML, BulkColumn)
FROM OPENROWSET(BULK 'D:\myverylarge.xml', SINGLE_CLOB) AS x;
DECLARE @XML AS XML
SELECT @XML=XMLData FROM XmlTable
;WITH XMLNAMESPACES ('http://www.w3.org/2001/XMLSchema-instance' as xs, DEFAULT 'http://somenamespace.whatever.com/schemas/xmldata/1/')
INSERT INTO DataFromXml(Column1, Column2, Column3, Column4, Column5)
SELECT ref.value('record[1][not(@xs:nil = "true")]' ,'varchar(100)') as Column1
,ref.value('record[2][not(@xs:nil = "true")]' ,'varchar(100)') as Column2
,ref.value('record[3][not(@xs:nil = "true")]' ,'varchar(100)') as Column3
,ref.value('record[4][not(@xs:nil = "true")]' ,'varchar(100)') as Column4
,ref.value('record[5][not(@xs:nil = "true")]' ,'varchar(100)') as Column5
FROM @XML.nodes('/myxml/mydata/item') xmlData( ref )
This runs for a minute or 2 which might not be so bad. I dont have a good reference. My feeling is that this could be a lot faster since getting the XML (over 100MB) into the database using the OPENROWSET just takes seconds.
Can I optimize the insert and if so, how would I do that?