0

Thanks for reading!

Using XML parsing tutorial from here as a reference, I am trying to parse a simple XML RSS feed with the following structure.

Everything works fine and all values are parsed except for the following case: I am not able to get the content of the <img> tag.


<feed>
    <title>This is Title</title>
    <count>10</count>
    <desc>
        This is a description for a sample feed <img src="http://someimagelink.com/img.jpg" />
    </desc>
    <link>This is link</link>
</feed>

This is what the endElement() method looks like:


        @Override
        public void endElement(String uri, String localName, String qName)
            throws SAXException {
        if(localName.equals("feed")) {
            //Add Records object to ArrayList
            //Feed is a POJO class to store all the feed content. 
            //FeedList is an ArrayList to store multiple Feed objects.
            mFeedList.add(mFeed); 
        }
        else if(localName.equals("title")) {
            mFeed.setTitle(currentValue.toString());
        }
        else if(localName.equals("count")) {
            mFeed.setCount(currentValue.toString());
        }
        else if(localName.equals("desc")) {
            mFeed.setDesc(currentValue.toString());
        }
        else if(localName.equals("img")) {
             //NEVER hits here :(
            mFeed.setImageUrl(currentValue.toString());
        }
        else if(localName.equals("link")) {
            //BUT, hits here
            mFeed.setLink(currentValue.toString());
        }

Since <img> tag is part of <desc> tag, the code in last else if condition never gets executed.

Note: When I read the the <desc> tag, I could do a manual String search to retrieve the <img> tag content. But, I am sure there has to be a more efficient way.

Can someone guide me on to get content of the <img> tag?

Thanks!

EDIT: Updated the <img> tag. It is now closed correctly.

EDIT2: Updating with startElement() code here. Also updated Feed XML and startElement() code.

@Override
public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {

    if(localName.equals("feed")) {
        //Instantiate Feed object
        mFeed = new Feed();
    }
    else if(localName.equals("title")) {
            currentValue = new StringBuffer("");
            isBuffering = true;
    }
    else if(localName.equals("count")) {
            currentValue = new StringBuffer("");
            isBuffering = true;     
    }
    else if(localName.equals("desc")) {
        currentValue = new StringBuffer("");
        isBuffering = true;
    }
    else if(localName.equals("img")) {
            currentValue = new StringBuffer("");
            isBuffering = true;
        }
    }
    else if(localName.equals("link")) {
        currentValue = new StringBuffer("");
        isBuffering = true;
    }       
}
Sagar Hatekar
  • 8,700
  • 14
  • 56
  • 72
  • 1
    The XML is not well-formed... "img" tag is never closed... – OcuS Apr 18 '11 at 22:21
  • OOps sorry I missed closing it while typing the question. The tag is now closed correctly. This is how the feed appears to me now - how do I extract the contents of the tag? Please help! – Sagar Hatekar Apr 19 '11 at 02:04

1 Answers1

1

The <img> tag actually has no character content, and the value you're after has to be pulled out of the attributes.

To do this, you need to override startElement(String namespaceURI, String localName, String qName, Attributes atts), recognize the <img> tag more or less as you're doing, and get the value you need out of the atts parameter.

Debugging help:

Using this (simple/stupid) handler:

package com.donroby.so;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class DebugHandler extends DefaultHandler {

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes)  throws SAXException {
        printParseInfo("startElement:", uri, localName, qName);
        int attributesLength = attributes.getLength();
        for (int i = 0; i < attributesLength; i++) {
            printAttributeInfo(attributes, i);
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName)  throws SAXException {
        printParseInfo("endElement:  ", uri, localName, qName);
    }

    @Override
    public void characters(char[] chars, int start, int length) throws SAXException {
        String str = "";
        for (int i = start; i < start + length; i++)
          str += chars[i];

        System.out.println("Characters: '" + str + "'");
    }

    private void printAttributeInfo(Attributes attributes, int i) {
        System.out.println(String.format("%s URI: '%s', localName: '%s', qName: '%s', Value: '%s'", "Attribute ",
                attributes.getURI(i), attributes.getLocalName(i), attributes.getQName(i), attributes.getValue(i)));
    }

    private void printParseInfo(String type, String uri, String localName, String qName) {
        System.out.println(String.format("%s URI: '%s', localName: '%s', qName: '%s'", type, uri, localName, qName));
    }

}
startElement: URI: '', localName: '', qName: 'feed'
Characters: '
    '
startElement: URI: '', localName: '', qName: 'title'
Characters: 'This is Title'
endElement:   URI: '', localName: '', qName: 'title'
Characters: '
    '
startElement: URI: '', localName: '', qName: 'count'
Characters: '10'
endElement:   URI: '', localName: '', qName: 'count'
Characters: '
    '
startElement: URI: '', localName: '', qName: 'desc'
Characters: '
        This is a description for a sample feed '
startElement: URI: '', localName: '', qName: 'img'
Attribute  URI: '', localName: 'src', qName: 'src', Value: 'http://someimagelink.com/img.jpg'
endElement:   URI: '', localName: '', qName: 'img'
Characters: '
    '
endElement:   URI: '', localName: '', qName: 'desc'
Characters: '
    '
startElement: URI: '', localName: '', qName: 'link'
Characters: 'This is link'
endElement:   URI: '', localName: '', qName: 'link'
Characters: '
'
endElement:   URI: '', localName: '', qName: 'feed'

This indicates that the<img> tag does indeed generate start and end events.

Don Roby
  • 40,677
  • 6
  • 91
  • 113
  • Actually, I had overriden startElement(), endElement(), and characters() methods. Now, when I put a breakpoint on the startElement() method and saw that "localName" has values of "desc" followed by "link". It totally skips "img" :( I am fairly new to this kind of XML/HTML parsing so don't know the terminologies for this kind of problem. So, even internet searches are futile :( Please help! – Sagar Hatekar Apr 19 '11 at 13:22
  • I can't work on it right now, as I have my real job to do, but I'll try to play with your xml this evening and update with some sample parsing code. It seems to me there *should* be a startElement event generated for the img. – Don Roby Apr 19 '11 at 13:38
  • Sure, I appreciate you taking time on this. Meanwhile, I am actually investigating extracting the tag using Regex but that's the last option on my mind. Looking forward to your insight on a better solution. Thanks! – Sagar Hatekar Apr 19 '11 at 14:01