Parse RSS with groovy

Question

I am trying to parse RSS feeds with groovy. I just wanted to extract the title and description tags' value. I used following code snippet to achieve this:

rss = new XmlSlurper().parse(url)
            rss.channel.item.each {
            titleList.add(it.title)
            descriptionList.add(it.description)
            }

After this, I am accessing these values in my JSP page. What is going wrong is the value of description that I am getting is not just of<description> (child of <channel>) but also of<media:description> (another optional child of <channel>). What can I change to only extract the value of<description> and omit the value of <media:description>?

Edit: To duplicate this behavior, you can execute following code on this website: http://www.tutorialspoint.com/execute_groovy_online.php

 def url = "http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml"
 rss = new XmlSlurper().parse(url)
 rss.channel.item.each {
    println"${it.title}"
    println"${it.description}"
}

You will see that the media description tag is also being printed in the console.

could you please either provide the mentioned url or an actual xml text, that shows the problems. — cfrick, Jun 11 '15 at 15:20
I am using this xml: http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml The results I am getting by extracting the description tag also include values of tag. I verified it by checking the page source. — clever_bassi, Jun 11 '15 at 15:23

score 1 · Accepted Answer · answered Jun 11 '15 at 15:44

1

You can tell XmlSlurper and XmlParser to not try to handle namespaces in the constructor. I believe this does what you are after:

'http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml'.toURL().withReader { r ->
    new XmlSlurper(false, false).parse(r).channel.item.each {
        println it.title
        println it.description
    }
}

answered Jun 11 '15 at 15:44

tim_yates

167,322
27
342
338

What do you mean by not to handle namespaces? I am not familiar with it – clever_bassi Jun 11 '15 at 15:47
`` is an xml tag with a namespace. The tag is `description`, but it is in the namespace `media` (defined by `xmlns:media="http://search.yahoo.com/mrss/"` in the xml). If you tell `XmlSlurper` to not parse namespaces, then this element will need to be accessed via `it.'media:description'` – tim_yates Jun 11 '15 at 15:49
Ok. The default XMLSluper() is non-namespace-aware. That means, it tries to get any tag with the word description and is a child of channel? – clever_bassi Jun 11 '15 at 15:52

Parse RSS with groovy

1 Answers1