4

I'm curious about this: if I need to use a Sax parser to boost up efficiency (it's a big file). Usually I use something like this:

public class Example extends DefaultHandler
{
    private Stack stack = new Stack ();

    public void startElement (String uri, String local, String qName, Attributes atts) throws SAXException
    {
        stack.push (qName);
    }

    public void endElement (String uri, String local, String qName) throws SAXException
    {
        if ("line".equals (qName))
            System.out.println ();

        stack.pop ();
    }

    public void characters (char buf [], int offset, int length) throws SAXException
    {
        if (!"line".equals (stack.peek ()))
            return;

        System.out.write (new String (buf, offset, length));
    }
}

example taken from here.

The Sax is already an implementation of a Visitor Pattern but in my case I just need to take the content of every element and do something with it according to the nature of the element itself.

My typical XML file is something like:

<?xml version="1.0" encoding="utf-8"?>
<labs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <auth>
        <uid> </uid>
        <gid> </gid>
        <key> </key>
    </auth>
    <campaign>
        <sms>
            <newsletter>206</newsletter>
            <message>
                <from>Da Definire</from>
                <subject>Da definire</subject>
                <body><![CDATA[Testo Da Definire]]></body>
            </message>
            <delivery method="manual"></delivery>
            <recipients>
                <db>276</db>
                <filter>
                    <test>1538</test>
                </filter>
                <new_recipients>
                    <csv_file>Corso2012_SMS.csv</csv_file>
                </new_recipients>
            </recipients>
        </sms>
    </campaign>
</labs>

When I'm in the csv_file node I need to take the filename and upload users from that file, if I'm in the filter/test I need to check if the filter exists and so on. Is there a way to apply the Visitor Pattern with SAX?

dierre
  • 7,140
  • 12
  • 75
  • 120

2 Answers2

1

You could simply have a Map<String, ElementHandler> in your SAX parser, and allow registering ElementHandlers for element names. Supposing that you're only interested in leaf elements:

  • each time an element starts, you look if there is a handler for this element name in the map, and you clear a buffer.
  • each time characters() is called, you append the characters to the buffer (if there was a handler for the previous element start)
  • each time an element is ended, if there was a handler for the previous element start, you call the handler with the content of the buffer

Here's an example:

private ElementHandler currentHandler;
private StringBuilder buffer = new StringBuilder();
private Map<String, ElementHandler> handlers = new HashMap<String, ElementHandler>();

public void registerHandler(String qName, ElementHandler handler) {
    handlers.put(qName, handler);
}    

public void startElement (String uri, String local, String qName, Attributes atts) throws SAXException {
    currentHandler = handlers.get(qName);
    buffer.delete(0, buffer.length());
}

public void characters (char buf [], int offset, int length) throws SAXException {
    if (currentHandler != null) {
        buffer.append(buf, offset, length);
    }
}

public void endElement (String uri, String local, String qName) throws SAXException {
    if (currentHandler != null) {
        currentHandler.handle(buffer.toString();
    }
}
JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • What if I need to handle something that is not a leaf? Like I have a tag new_recipient and inside I have something like: 1Paul and those leaves are not standard, they depends on how db fields are called. – dierre Jun 01 '12 at 21:20
0

Don't forget StAX . It probably won't make Visitor pattern any easier, but if your documents are relatively simple and you're already planning on streaming them, it does have a simpler programming model than SAX. You just iterate over the events in the parsed stream, one a time, ignoring or acting on them as you choose.

John Watts
  • 8,717
  • 1
  • 31
  • 35