5

I'm creating a very large XML file (700mb +) that process large amounts of data via batch. The program serves as an interface between a extremely large sybase database and an application. I currently have the xsd schema bound to classes. I need a way of being able to write the XML with restart logic in mind.

I.E. being able to know where I left off. Or in other words, if the program fails, I need to be able to see what the was last wrote to the XML file so it can pick up where I left off. Here's an exmaple.

<root>
  <WorkSet>
    <Work>
      <Customer>
    <Work>
      <Customer>
  <WorkSet>
    <Work>
      .....
<root>

Say the program fails after writing a write 'work' or 'workset' node. Is there a way to pick up where I left off processing? I'm trying to avoid reading the XML file back into memory due to the shear size of the XML file (Say it finishes 500mb of XML and fails).

Thanks for the help.

TyC
  • 792
  • 6
  • 11
  • 23

2 Answers2

3

If you could split your data to independent WorkSet elements you can write them out one at a time with JAXB's fragment mode (when JAXB does not write the headers). Later simply concatenate the files and add the missing XML declaration, opening end closing tags.

It's is possible that you have to modify your generated classes for this. I mean adding @XmlRootElement to the WorkSet java class. If one WorkSet is still big for one step you can do this with Work too, but you have to generate somehow the missing tags.

palacsint
  • 28,416
  • 10
  • 82
  • 109
  • Would adding the @XmlRootElement tag to the WorkSet java class affect the functionality of the one large XML file after concatenation of multiple WorkSets? I'm somewhat new the JAXB so let me know if I'm not understanding it correctly. – TyC Sep 14 '11 at 12:22
  • 1
    I don't think that adding an `@XmlRootElement` will affect your former functionality. Without adding the `@XmlRootElement` the `Marshaller.marshal()` throws the following exception: `com.sun.istack.SAXException2: unable to marshal type "org.package.MyWorkSet" as an element because it is missing an @XmlRootElement annotation`. – palacsint Sep 14 '11 at 14:36
  • I was able to marshall just the WorkSet element and all child elements without adding the @XmlRootElement to the WorkSet class using the JAXB fragment mode. But now, it's applying the namespace attribute to the WorkSet elements, which seems strange because when I marshalled the previous entire root and child elements it never had this attribute. Is there a way to remove this within JAXB or will I just have to substring it out? – TyC Sep 14 '11 at 14:43
  • 1
    If it's working without `@XmlRootElement` use without that :) In my environment it throws the mentioned SAXException2. – palacsint Sep 14 '11 at 15:47
2

I don't think JAXB is the appropriate tool for this job, but ...

You could write a custom Marshaller implementation that keeps track of what objects have been marshalled and use the fragment mode to write out individual objects.

  • +1 JAXB doesn't sound like the right tool here. Why don't use just use a SAX parser and you can track where you are in the file yourself. Take a look at StAX for example ... http://stax.codehaus.org/Home – Brad Nov 03 '11 at 17:47