0

I'm making HTTP-queries to a website and response I get is in XML-format. What I want to do is make multiple queries, parse data and have them in an ArrayList or some other container so I can easily access each query's data. I've been using some time to play with SAX for parsing the response. Examples I read had XML format like this:

<?xml version="1.0"?>
<company>
        <staff>
                <firstname>yong</firstname>
                <lastname>mook kim</lastname>
                <nickname>mkyong</nickname>
                <salary>100000</salary>
        </staff>
        <staff>
                <firstname>low</firstname>
                <lastname>yin fong</lastname>
                <nickname>fong fong</nickname>
                <salary>200000</salary>
        </staff>

I managed to parse format like this pretty easily just by looking at the examples on the internet.

But in my case I need to parse data like this:

<?xml version="1.0" encoding="UTF-8"?>
<root response="True">
<movie title="A Good Marriage" year="2014" rated="R" released="03 Oct 2014" runtime="102 min" genre="Thriller" director="Peter Askin" writer="Stephen King (short story)" actors="Joan Allen, Anthony LaPaglia, Stephen Lang, Cara Buono" plot="After 25 years of a good marriage, what will Darcy do once she discovers her husband's sinister secret?" language="English" country="USA" awards="N/A" poster="http://ia.media-imdb.com/images/M/MV5BMTk3MjY2ODgwNl5BMl5BanBnXkFtZTgwMTQ0Mjg0MjE@._V1_SX300.jpg" metascore="43" imdbRating="5.1" imdbVotes="2,016" imdbID="tt2180994" type="movie"/>
</root>

And from this response I want parse all the things to some container, so it's easy to use. I'm still learning things, maybe someone can help me out here, point me to right direction? :) Making queries is not a problem but parsing and storing data is.

EDIT: So to be more clear, my problem is that response from server isn't in neat XML-format like in the first example, you can see it's like this:

<movie title="A Good Marriage" year="2014" rated="R" released="03 Oct 2014" runtime="102 min" genre="Thriller" director="Peter Askin" writer="Stephen King (short story)" actors="Joan Allen, Anthony LaPaglia, Stephen Lang, Cara Buono" plot="After 25 years of a good marriage, what will Darcy do once she discovers her husband's sinister secret?" language="English" country="USA" awards="N/A" poster="http://ia.media-imdb.com/images/M/MV5BMTk3MjY2ODgwNl5BMl5BanBnXkFtZTgwMTQ0Mjg0MjE@._V1_SX300.jpg" metascore="43" imdbRating="5.1" imdbVotes="2,016" imdbID="tt2180994" type="movie"/>

And when I run my code, it doesn't print out anything but when I modify XML a bit manually like this:

<?xml version="1.0" encoding="UTF-8"?>
<root response="True">
<movie> title="Oblivion" year="2013" rated="PG-13" released="19 Apr 2013" runtime="124 min" genre="Action, Adventure, Mystery" director="Joseph Kosinski" writer="Karl Gajdusek (screenplay), Michael Arndt (screenplay), Joseph Kosinski (graphic novel original story)" actors="Tom Cruise, Morgan Freeman, Olga Kurylenko, Andrea Riseborough" plot="A veteran assigned to extract Earth's remaining resources begins to question what he knows about his mission and himself." language="English" country="USA" awards="10 nominations." poster="http://ia.media-imdb.com/images/M/MV5BMTQwMDY0MTA4MF5BMl5BanBnXkFtZTcwNzI3MDgxOQ@@._V1_SX300.jpg" metascore="54" imdbRating="7.0" imdbVotes="307,845" imdbID="tt1483013" type="movie"/>
</movie>
</root>

So I added ending tag > for the movie-element and ending tag </movie> to the end, my program prints it like:

Movie :  title="Oblivion" year="2013" rated="PG-13" released="19 Apr 2013" runtime="124 min" genre="Action, Adventure, Mystery" director="Joseph Kosinski" writer="Karl Gajdusek (screenplay), Michael Arndt (screenplay), Joseph Kosinski (graphic novel original story)" actors="Tom Cruise, Morgan Freeman, Olga Kurylenko, Andrea Riseborough" plot="A veteran assigned to extract Earth's remaining resources begins to question what he knows about his mission and himself." language="English" country="USA" awards="10 nominations." poster="http://ia.media-imdb.com/images/M/MV5BMTQwMDY0MTA4MF5BMl5BanBnXkFtZTcwNzI3MDgxOQ@@._V1_SX300.jpg" metascore="54" imdbRating="7.0" imdbVotes="307,845" imdbID="tt1483013" type="movie"/>

So basically code I'm using at the moment reads everything between <movie> and </movie>, problem is that original response from the server leaves movie tag open like this: <movie title="Oblivion"... and doesn't have </movie> tag either.

I've been struggling pretty long with this, hopefully someone understands my confusing explanation! At the moment my parser code looks like this:

public void getXml(){
    try {
        // obtain and configure a SAX based parser
        SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();

        // obtain object for SAX parser
        SAXParser saxParser = saxParserFactory.newSAXParser();

        // default handler for SAX handler class
        // all three methods are written in handler's body
        DefaultHandler defaultHandler = new DefaultHandler(){

            String movieTag="close";

        // this method is called every time the parser gets an open tag '<'
        // identifies which tag is being open at time by assigning an open flag
        public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {

                if(qName.equalsIgnoreCase("MOVIE")) {
                    movieTag = "open";
                }
            }

        // prints data stored in between '<' and '>' tags
        public void characters(char ch[], int start, int length)
            throws SAXException {

                if(movieTag.equals("open")) {
                    System.out.println("Movie : " + new String(ch, start, length));
                }
            }

        // calls by the parser whenever '>' end tag is found in xml 
        // makes tags flag to 'close'
        public void endElement(String uri, String localName, String qName)
            throws SAXException {

                if(qName.equalsIgnoreCase("MOVIE")) {
                    movieTag = "close";
                }
            }
            };

        // parse the XML specified in the given path and uses supplied
        // handler to parse the document
        // this calls startElement(), endElement() and character() methods
        // accordingly
        saxParser.parse("xml/testi.xml", defaultHandler);
        } catch (Exception e) {
            e.printStackTrace();
            }
    }

Please anyone, help is greatly appreciated..

mpak
  • 2,458
  • 2
  • 13
  • 19

1 Answers1

0

You can still use a SAX parser, which you've been learning. You didn't mention which parser you're using. I use xerxes (from Apache.org).

What you might want to do is implement a class that extends DefaultHandler. If you're using Eclipse as your IDE, you can have Eclipse implements stubs for all the methods from DefaultHandler, then add debug output to each of them to get a better feel for what happens.

But the important method is this:

public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException

All your fields (title, year, rated, etc) will be available in the attributes array.

Then what you'll get:

-A call to startElement for the -A call to startElement for the

Plus other calls you don't care about. So once you understand what you're doing, you can delete the methods that are nothing but debug statements, if you want.

Joseph Larson
  • 8,530
  • 1
  • 19
  • 36
  • Thanks for the answer and sorry for slow response, gotta try later adding some debug outputs to get better idea. I'm using NetBeans IDE and I'm using javax.xml.parsers.SAXParser class. – mpak Feb 20 '15 at 23:11