Trying to load XML through Azure Data Lake jobs I faced unbreakable issues with E_RUNTIME_USER_STRINGTOOBIG using Microsoft.Analytics.Samples.Formats
The thing is that XmlDomExtractor the only way to join (or rather apply) elements from inside the same XML file (One-To-Many).
@xml =
EXTRACT Id string,
//...
Products string
FROM @"/pathToXml.xml"
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor("contract",
new SQL.MAP<string, string>{
{"id", "Id"},
//...
// this is actually a node that may contain thousand of child node
{"products", "Products"}
});
Apply:
@data = SELECT c.Id,
//....
pp.ProductSid,
// ... other product properties
FROM @prepareData AS c
OUTER APPLY
new Microsoft.Analytics.Samples.Formats.Xml.XmlApplier ("Products","/",new SQL.MAP<string, string>{
{"/abc/p/s", "ProductSid"},
//....
}) AS pp(
ProductSid string,
/*....*/
);
The full version of the code is here
I've tried to minimize my XML node by replacing names by letter. Unfortunately, it didn't help because of thousands of the items inside (+ long names of the products) broke through the limitation anyway.