Parsing the CDATA section in XML using XML Pull Parser

Question

Sample XML

<feed xmlns="http://www.w3.org/2005/Atom">
    <title>NDTV News - Top Stories</title>
    <link>http://www.ndtv.com/</link>
    <description>Latest entries</description>
    <language>en</language>
    <pubDate>Wed, 31 Jul 2013 22:33:00 GMT</pubDate>
    <lastBuildDate>Wed, 31 Jul 2013 22:33:00 GMT</lastBuildDate>
    <entry>
    <title>Narendra Modi to be BJP's PM candidate, announcement before crucial assembly polls: sources</title>
    <link>http://feedproxy.google.com/~r/NdtvNews-TopStories/~3/XN7dMIDe5YI/story01.htm</link>
    <published>Wed, 31 Jul 2013 13:58:31 GMT</published>
    <author>
    <name>user42715</name>
    </author>
    <content type="html"><![CDATA[<div align="center"><a href="http://www.ndtv.com/news/images/topstory_thumbnail/  Shatrughan_Sinha_agency_120.jpg"><img border="0" src="http://www.ndtv.com/news/images/topstory_thumbnail/Shatrughan_Sinha_agency_120.jpg" alt="2013-07-29-08-43-05" /></a></div><p><span style="font-size: large;">The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.</span><br /><br /><span style="font-size: large;"> The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September. </span><br /><br /><span style="font-size: large;">The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.   </span><br /><br /></p>]]></content>
   </entry>
</feed>

With the below code I was able to retrieve , and values within the tag.

XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
        private XmlPullParser parser = factory.newPullParser();
        private InputStream urlStream = downloadUrl(urlString);
        parser.setInput(urlStream, null);
        int eventType = parser.getEventType();
        boolean done = false;

        while (eventType != XmlPullParser.END_DOCUMENT && !done) {
            tagName = parser.getName();

            switch (eventType) {
            case XmlPullParser.START_DOCUMENT:                  
                break;
            case XmlPullParser.START_TAG:
                if (tagName.equals("entry")) {                      
                }
                if (tagName.equals("title")) {
                    title = parser.nextText().toString();
                    Log.i(TITLE, title);
                }
                if (tagName.equals("published")) {
                    pubDate = parser.nextText().toString();
                    Log.i(PUBLISHEDDATE, pubDate);
                }

                if (tagName.equals("author")) {
                    readAuthor(parser);
                    Log.i(AUTHOR, author);
                }

                break;
            case XmlPullParser.END_TAG:
                if (tagName.equals("feed")) {
                    done = true;
                } else if (tagName.equals("entry")) {

                    rssFeed = new RssFeedStructure(title);
                    rssFeedList.add(rssFeed);
                }
                break;
            }
            eventType = parser.next();
        }

        private String readAuthor(XmlPullParser parser) throws IOException,
            XmlPullParserException {
            parser.nextTag();
            parser.require(XmlPullParser.START_TAG, null, "name");
            author = parser.nextText().toString();
            parser.require(XmlPullParser.END_TAG, null, "name");
            return author;
        }

From the tag how can I retrieve the "href" value within the and the text value(The BJP is likely to anoint Narendra Modi.....) from the

tag.

does it work now?? updated post. did you check it. works fine now you get the data from the second span also — Raghunandan, Aug 12 '13 at 12:44

Raghunandan · Answer 1 · 2013-08-12T12:46:09.520

You can use JSoup. Download @ http://jsoup.org/download. Add the jar to the libs folder.

To parser i copied the rss feed to xml file in assests folder. (localy)

XmlPullParser xpp = factory.newPullParser();
InputStream is = this.getAssets().open("xmlparser.xml");
xpp.setInput(is, "UTF_8");

You can use the below since you have the url. I ave shown how to extract the url and the content. you need to extract the contents of other tags as you would do normally.

  XmlPullParser xpp = factory.newPullParser();

    xpp.setInput(urlStream, null);

    boolean insideItem = false;

    // Returns the type of current event: START_TAG, END_TAG, etc..
    int eventType = xpp.getEventType();
    while (eventType != XmlPullParser.END_DOCUMENT) {
        if (eventType == XmlPullParser.START_TAG) {

            if (xpp.getName().equalsIgnoreCase("entry")) {
                insideItem = true;
            }
             else if (xpp.getName().equalsIgnoreCase("content")) {
                    if (insideItem)
                    {
                        Document doc = Jsoup.parse(xpp.nextText());

                        Elements links = doc.select("a[href]"); // a with href
                          for (Element link : links) {
                                Log.i("........",""+link.attr("abs:href"));
                            }

                        Element divcontent = doc.select("span").first();

                        Log.i("..........",""+divcontent.text());

                    }
                }
        } else if (eventType == XmlPullParser.END_TAG
                && xpp.getName().equalsIgnoreCase("entry")) {
            insideItem = false;
        }

        eventType = xpp.next(); // move to next element
    }

} catch (MalformedURLException e) {
    e.printStackTrace();
} catch (XmlPullParserException e1) {
    e1.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}
}

Log :

08-03 08:03:04.413: I/........(1524): http://www.ndtv.com/news/images/topstory_thumbnail/   Shatrughan_Sinha_agency_120.jpg
08-03 08:03:04.423: I/..........(1524): The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.

Edit: To loop through the elements

Elements divcontent = doc.select("span");
for(int k= 1;k<divcontent.size();k++)
{
     String spancontent =divcontent.get(k).text();
     Log.i("..........",spancontent);
}

It is working fine. But not returning the text enclosed within the the second tag. — user1382802, Aug 12 '13 at 11:59
@user1382802 loop through the element and get the text you will get the content of second span also. try it and let me know if it works — Raghunandan, Aug 12 '13 at 12:14
For all the items I'm getting the first items description as the spancontent. — user1382802, Aug 14 '13 at 09:19
@user1382802 i din't understand your comment. can you be more specific — Raghunandan, Aug 14 '13 at 09:33
While parsing the second item() in the feed it displays the first item's() text within the .i.e; for all the items it displays the first items text within the tag. — user1382802, Aug 14 '13 at 10:07
@user1382802 to what you have posted i have given you the solution. It works you need to modify according to your requirement. Are you expecting me to do your work? What is the difficulty in looping through the tags to get the data? It took you days to even reply back. i don't think i can help further — Raghunandan, Aug 14 '13 at 10:10

Parsing the CDATA section in XML using XML Pull Parser

1 Answers1

Linked