0

I am trying to parse RSS feeds with groovy. I just wanted to extract the title and description tags' value. I used following code snippet to achieve this:

rss = new XmlSlurper().parse(url)
            rss.channel.item.each {
            titleList.add(it.title)
            descriptionList.add(it.description)
            }

After this, I am accessing these values in my JSP page. What is going wrong is the value of description that I am getting is not just of<description> (child of <channel>) but also of<media:description> (another optional child of <channel>). What can I change to only extract the value of<description> and omit the value of <media:description>?

Edit: To duplicate this behavior, you can execute following code on this website: http://www.tutorialspoint.com/execute_groovy_online.php

 def url = "http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml"
 rss = new XmlSlurper().parse(url)
 rss.channel.item.each {
    println"${it.title}"
    println"${it.description}"
}

You will see that the media description tag is also being printed in the console.

clever_bassi
  • 2,392
  • 2
  • 24
  • 43
  • could you please either provide the mentioned url or an actual xml text, that shows the problems. – cfrick Jun 11 '15 at 15:20
  • I am using this xml: http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml The results I am getting by extracting the description tag also include values of tag. I verified it by checking the page source. – clever_bassi Jun 11 '15 at 15:23

1 Answers1

1

You can tell XmlSlurper and XmlParser to not try to handle namespaces in the constructor. I believe this does what you are after:

'http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml'.toURL().withReader { r ->
    new XmlSlurper(false, false).parse(r).channel.item.each {
        println it.title
        println it.description
    }
}
tim_yates
  • 167,322
  • 27
  • 342
  • 338
  • What do you mean by not to handle namespaces? I am not familiar with it – clever_bassi Jun 11 '15 at 15:47
  • `` is an xml tag with a namespace. The tag is `description`, but it is in the namespace `media` (defined by `xmlns:media="http://search.yahoo.com/mrss/"` in the xml). If you tell `XmlSlurper` to not parse namespaces, then this element will need to be accessed via `it.'media:description'` – tim_yates Jun 11 '15 at 15:49
  • Ok. The default XMLSluper() is non-namespace-aware. That means, it tries to get any tag with the word description and is a child of channel? – clever_bassi Jun 11 '15 at 15:52