2

I have a XML file (test.xml) like this one:

<?xml version="1.0" encoding="ISO-8859-1"?>
<s2xResponse>
  <s2xData>
    <Name>This is the name</Name>
    <InfocomData>
      <DateOfUpdate day="07" month="02" year="2018">20180207</DateOfUpdate>
      <CompanyName>MY COMPANY</CompanyName>
      <TaxCode FlagCheck="0">XXXYYYWWWZZZ</TaxCode>
    </InfocomData>
    <AssessmentSummary>
      <Rating Code="2">Rating Description for Code 2</Rating>
    </AssessmentSummary>
    <AssessmentData>
      <SectorialDistribution>
        <CompaniesNumber>11650</CompaniesNumber>
        <ScoreDistribution />
        <CervedScoreDistribution>
          <DistributionData>
            <Rating Code="1">SICUREZZA</Rating>
            <Percentage>1.91</Percentage>
          </DistributionData>
          <DistributionData>
            <Rating Code="2">SOLVIBILITA' ELEVATA</Rating>
            <Percentage>35.56</Percentage>
          </DistributionData>
        </CervedScoreDistribution>
      </SectorialDistribution>
    </AssessmentData>
  </s2xData>
</s2xResponse>

I'm trying to get the "Name" node text ("This is the name") with a U-SQL script using the XmlExtractor. The following is the code I'm using:

USE TestXML; // It contains the registered assembly

REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@xml = EXTRACT xml_text string
       FROM "textxpath/test.xml"
       USING Extractors.Text(rowDelimiter: "^", quoting: false);

@xml_cleaned =
    SELECT
        xml_text.Replace("\r\n", "").Replace("\t", "    ") AS xml_text
    FROM @xml;

@values =
    SELECT Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text, "s2xResponse/s2xData/Name")[1] AS value
    FROM @xml_cleaned;


OUTPUT @values TO @"outputs/test_xpath.txt" USING Outputters.Text(quoting: false);

But I'm getting this runtime error:

Execution failed with error '1_SV1_Extract Error : '{"diagnosticCode":195887116,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXPRESSIONEVALUATION","message":"Error while evaluating expression Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text.Replace(\"\r\n\", \"\").Replace(\"\t\", \" \"), \"s2xResponse/s2xData/Name\")[1]","description":"Inner exception from user expression: Index was out of range. Must be non-negative and less than the size of the collection.

I get the same error even if I use a zero index for the Evaluate result ([0]).

What's wrong with my query?

lucazav
  • 858
  • 9
  • 24

2 Answers2

2

The problem here is that you are applying the subscript [1] to the result of XPath.Evaluate, which I believe will be returning the Name nodes. However, you are applying the [1] subscript in code, not in XPath, so the subscript is likely to be zero based, and not 1-based as it is in XPath, hence the Index out of range error.

Here's one solution - simply apply the subscript operator in Xpath (where it is still 1-based), and select the text() there

 .Evaluate("s2xResponse/s2xData/Name[1]/text()")
StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • Or possibly `(s2xResponse/s2xData/Name)[1]` – StuartLC Jun 08 '18 at 14:08
  • The Evaluate method (with a single XPath string as parameter) returns a SqlArray, so the [0] index is mandatory. I used this form: `Evaluate(xml_text, "s2xResponse/s2xData/Name[1]/text()")[0]` and it works like a charm :) The key point I was missing was the using of text() at the end of the xpath. Infact if I use the following form: `Evaluate(xml_text, "s2xResponse/s2xData/Name/text()")[0]` it works fine too. Thanks! – lucazav Jun 08 '18 at 14:58
1

Is there a particular reason you want to use the Evaluate method? I got his to work using the XmlDomExtractor, which would allow you to extract multiple values from the xml, eg

REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @inputFile string = "/input/input100.xml";

@input =
    EXTRACT Name string
    FROM @inputFile
    USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath : "/s2xResponse",
          columnPaths : new SQL.MAP<string, string>{
          { "s2xData/Name", "Name" },
          }
          );


@output =
    SELECT *
    FROM @input;
wBob
  • 13,710
  • 3
  • 20
  • 37
  • Thank you for pointing out that, @wBob! Sincerely, the XmlExtractor is incredibly badly documented and I didn't catch the differences between all the classes it contains (XmlApplier, XmlDomExtractor, XmlExtractor, XPath) – lucazav Jun 08 '18 at 16:49