0

XML Parse operator throws this error while working with large XML files: The following error occurred during XML parsing: internal error: Huge input lookup

While documentation says this has been fixed in Streams 4.2.1.3 where we can add this parameter to XML Parse operator to fix it: xmlParseHuge: true;

The above parameter is not supported in lower versions of Streams. How do I fix this in Streams 4.2.1.1?

Ankit Sahay
  • 1,710
  • 8
  • 14

2 Answers2

0

If the XML data is coming from a FileSource, try the workaround of using a smaller block size for parsing the file: Change it to 10000u*1024u to complete large XML parsing successfully.

stream<blob dataBlob,rstring fName> FileLoadedFromFS = FileSource(DirFileScanned){

param format : block;
      blockSize : 10000u*1024u;
      compression : gzip;
      parsing : fast;
      output FileLoadedFromFS:
      fName = FileName();
    }

From: http://www-01.ibm.com/support/docview.wss?uid=swg1IT22914

ndsilva
  • 309
  • 1
  • 8
  • I have seen this workaround in the documentations, but the XML is not coming from a file but MQ. Is there a solution for that? – Ankit Sahay Apr 03 '18 at 03:18
  • I noticed you also posted on the Streamsdev forums - did you try the workaround listed there of splitting the blob using the Parse operator? https://developer.ibm.com/answers/questions/440051/xml-parse-operator-throws-error-when-working-with/ – ndsilva Apr 05 '18 at 15:02
  • @ndsila : No, I did not use that option. I used topology toolkit to make a python operator. I have written it as an answer below. Please check it out. – Ankit Sahay Apr 06 '18 at 21:35
0

There was not better way to do this is in Streams 4.2.1.1 I finally decided to use topology toolkit to make a Python operator. XML tuples were passed through this operator and xml.etree.ElementTree library was used to parse the XML, extract required data and return back the tuple type.

Ankit Sahay
  • 1,710
  • 8
  • 14