1

I am attempting to write out a very large XML object, using the code below. I am processing 200K-350K objects/nodes, and the output-to-file is unbearably slow.

Any suggestions on how to improve the performance of the output implementation? I understand that the IndentingXMLStreamWriter may be one of the culprits, but I really need the output to be human readable (even if it is likely not going to be read due to size).

driver implementation...

public class SomeClient {
    public static void main(String args[]) {
        TransactionXmlWriter txw      = new TransactionXmlWriter();
        TransactionType      tranType = getNextTransaction();

        try {
            txw.openXmlOutput("someFileName.xml");
            while(tranType != null) {
                txw.processObject(tranType);
                tranType = getNextTransaction();
            }
            txw.closeXmlOutput();
        } catch(JAXBException e) {
        } catch(FileNotFoundException e) {
        } catch(XMLStreamExceptoin e) {
        }
    }
}

implementation class...

public class TransactionXmlWriter {

    private final QName root = new QName("ipTransactions");

    private Marshaller       marshaller       = null;
    private FileOutputStream fileOutputStream = null;
    private XMLOutputFactory xmlOutputFactory = null;
    private XMLStreamWriter  xmlStreamWriter  = null;

    // constructor
    public TransactionXmlWriter() throws JAXBException{
        JAXBContext jaxbContext = JAXBContext.newInstance(TransactionType.class);

        xmlOutputFactory = XMLOutputFactory.newFactory();
        marshaller       = jaxbContext.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
    }

    // write out "body" of XML
    public void processObject(TransactionType transaction) {
        JAXBElement<TransactionType> transactionJaxB = null;

        try {
            transactionJaxB = new JAXBElement<>(root, TransactionType.class, transaction);
            marshaller.marshal(transactionJaxB, xmlStreamWriter);
        } catch(JAXBException e) {
            // TO DO : some kind of error handling
            System.out.println(e.getMessage());
            System.out.println(e.getStackTrace());
        }
    }

    // open file to write XML into
    public void openXmlOutput(String fileName) throws FileNotFoundException,
                                                      XMLStreamException {
        fileOutputStream = new FileOutputStream(fileName);
        xmlStreamWriter  = new IndentingXMLStreamWriter(xmlOutputFactory.createXMLStreamWriter(fileOutputStream));
        writeXmlHeader();
    }

    // write XML footer and close the stream/file
    public void closeXmlOutput() throws XMLStreamException {
        writeXmlFooter();
        xmlStreamWriter.close();
    }

    private void writeXmlHeader() throws XMLStreamException {
        xmlStreamWriter.writeStartDocument("UTF-8", "1.0");
        xmlStreamWriter.writeStartElement("ipTransactions");
    }

    private void writeXmlFooter() throws XMLStreamException {
        xmlStreamWriter.writeEndElement();
        xmlStreamWriter.writeEndDocument();
    }
}
lexicore
  • 42,748
  • 17
  • 132
  • 221
SoCal
  • 801
  • 1
  • 10
  • 23
  • 3
    I would never write directly into a `FileOutputStream`. For file operations I would recommend always to use writer that buffers (`BufferedOutputStream`) the data (min 16KB) so that the write speed is optimal. – Robert Jan 18 '18 at 19:23
  • Robert - Thanks. That change alone resulted in a 14x improvement on the overall execution times. – SoCal Jan 18 '18 at 22:24
  • I am also looking into the employment of Woodstox to see how much performance gain I can realize. But... Any other/additional suggestions would be greatly appreciated. – SoCal Jan 18 '18 at 22:25
  • Write serialization code directly (take an object, write tags and values as strings into the stream). Measure the difference -- that's the fastest you can get eventually. I ran your code (with a small TransactionType class), for 300_000 objects it runs in ~4s on my laptop, direct serialization in ~0.8s. The difference is most likely related to reflection and other JAXB implementation specifics. If it's mostly reflection, then you will get the same results with all other reflection-based libraries. Otherwise you might get a 2-3 times improvement and still use a general tool. – starikoff Jan 21 '18 at 00:28
  • "Write serialization code directly (take an object, write tags and values as strings into the stream)..." I am hoping to avoid that, as there's going to be quite a few non-trivial XML schemas that will, eventually, be output. However, I'll take a look at that approach, as your execution times absolutely kill what I've observed. – SoCal Jan 22 '18 at 04:15
  • On the comment about "reflection", shouldn't that mostly be a one-time hit (i.e., when the JAXB context is being instantiated)? I did some work with reflection before, and once the actual reflection was performed, noted that subsequent execution showed no significant degradation (i.e., it was an upfront hit, not recurring). – SoCal Jan 22 '18 at 04:17
  • [Seems like](https://stackoverflow.com/questions/2133732/does-jaxb-use-bytecode-instrumentation) my view of how JAXB uses reflection was overly simplistic. – starikoff Jan 22 '18 at 08:16

0 Answers0