7

I'm having problems looping over an XML file about 20-30 MB (650000 rows).

This is my meta-code:

<cffile action="READ" ile="file.xml" variable="usersRaw">

<cfset usersXML = XmlParse(usersRaw)>
<cfset advsXML = XmlSearch(usersXML, "/advs/advuser")>
<cfset users = XmlSearch(usersXML, "/advs/advuser/user")>

<cfset numUsers = ArrayLen(users)>
<cfloop index="i" from="1" to="#numUsers#">
    ... some selects...
    ... insert...
    <cfset advs = annunciXml[i]["vehicle"]>
    <cfset numAdvs = ArrayLen(advs)> 
    <cfloop index="k" from="1" to="#numAdvs#">        
        ... insert... or ... update...
    </cfloop>
</cfloop>

struct of xml file is (yes, is not very good :-)

<advs>
   <advuser>
      <user>
      </user>
      <vehicle>
      <vehicle>
   </advuser>
</advs>

After ~120,000 rows I get an error: "Out of memory".

How can I improve performance of my script?

How can I diagnose where there is max memory consumption?

ale
  • 6,369
  • 7
  • 55
  • 65
Roberto
  • 752
  • 1
  • 13
  • 29

4 Answers4

11

@SamG is correct that ColdFusion XML parsing can't do it because of the DOM parser, but SAX is painful, instead use a StAX parser, which provides a much simpler iterator interface. See the answer to another question I provided for an example of how to do this with ColdFusion.

This is roughly what you'd do for your example:

<cfset fis = createObject("java", "java.io.FileInputStream").init(
    "#getDirectoryFromPath(getCurrentTemplatePath())#/file.xml"
)>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>

<cfloop condition="#reader.hasNext()#">
    <cfset event = reader.next()>
    <cfif event EQ reader.START_ELEMENT>
        <cfswitch expression="#reader.getLocalName()#">
            <cfcase value="advs">
                <!--- root node, do nothing --->
            </cfcase>
            <cfcase value="advuser">
                <!--- set values used later on for inserts, selects, updates --->
            </cfcase>
            <cfcase value="user">
                <!--- some selects and insert --->
            </cfcase>
            <cfcase value="vehicle">
                <!--- insert or update --->
            </cfcase>
        </cfswitch>
    </cfif>
</cfloop>

<cfset reader.close()>
Community
  • 1
  • 1
orangepips
  • 9,891
  • 6
  • 33
  • 57
  • is coldfusion a javascript compatible environment? – vtd-xml-author Feb 14 '11 at 18:43
  • @vtd-xml-author: explain what you mean by "compatible environment". ColdFusion's primary purpose is a server side language for responding to HTTP requests. As such, its output is typically HTML, CSS and JavaScript. And, it has features for responding to AJAX requests easily. – orangepips Feb 14 '11 at 18:47
  • Just out of curiousity, what if you used jquery to parse the xml and then send the results to a coldfusion cfc, who would process the data line by line. Just a thought. – crosenblum Feb 14 '11 at 20:02
  • 3
    Haven't tried it, but believe JavaScript (and by extension jquery) also creates a DOM representation, so that would probably end up being memory constrained as well. – orangepips Feb 14 '11 at 20:32
2

orangepips provides a reasonable solution. Please take a look at Ben Nadel's solution for handling very large XML files in ColdFusion. I have tested his approach on a 50MB XML file with 1.2 million lines. Ben uses a similar approach that orangepips provides here -- stream it using Java, then XMLParse each node in ColdFusion to get to the goods. Check it out -- like most of Ben Nadel's code and tutorials, it just works.

http://www.bennadel.com/blog/1345-Ask-Ben-Parsing-Very-Large-XML-Documents-In-ColdFusion.htm

Marty McGee
  • 745
  • 8
  • 11
1

I believe the Cold Fusion XML Parser uses DOM Parsing, which is not suitable for such file sizes. You should try and find a SAX parser, which are event driven. Maybe this link will help http://coldfusion.sys-con.com/node/236002

SamG
  • 646
  • 4
  • 8
1

I don't know ColdFusion, but 20-30Mb is not out of range for technologies that build an in-memory tree; many people are routinely running XSLT transformations on 200Mb files.

Moving to SAX parsing sounds like an extreme measure - it's such a low-level interface.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164