12

I'm trying to parse a rss feed that looks like this for the attribute "date":

<rss version="2.0">
<channel>
    <item>
        <y:c date="AA"></y:c>
    </item>
</channel>
</rss>

I tried several different versions of this: (rssFeed contains the RSS data)

println(((rssFeed \\ "channel" \\ "item" \ "y:c" \"date").toString))

But nothing seems to work. What am I missing?

Any help would really be appreciated!

Chris
  • 9,209
  • 16
  • 58
  • 74

4 Answers4

20

The "y" in <y:c is a namespace prefix. It's not part of the name. Also, attributes are referred to with a '@'. Try this:

println(((rssFeed \\ "channel" \\ "item" \ "c" \ "@date").toString))
sblundy
  • 60,628
  • 22
  • 121
  • 123
14

Attributes are retrieved using the "@attrName" selector. Thus, your selector should actually be something like the following:

println((rssFeed \\ "channel" \\ "item" \ "c" \ "@date").text)
Daniel Spiewak
  • 54,515
  • 14
  • 108
  • 120
  • 1
    Note the .text to get the date as a String rather than a Node – sblundy May 17 '10 at 18:03
  • 1
    Indeed. The `text` method is generally preferable to `toString` since it will gracefully handle the case where your selector grabbed a chunk of XML rather than a `Text` node. – Daniel Spiewak May 17 '10 at 18:17
3

Also, think about the difference between \ and \\. \\ looks for a descendent, not just a child, like this (note that it jumps from channel to c, without item):

scala> (rssFeed \\ "channel" \\ "c" \ "@date").text
res20: String = AA

Or this sort of thing if you just want all the < c > elements, and don't care about their parents:

scala> (rssFeed \\ "c" \ "@date").text            
res24: String = AA

And this specifies an exact path:

scala> (rssFeed \ "channel" \ "item" \ "c" \ "@date").text
res25: String = AA
James Moore
  • 8,636
  • 5
  • 71
  • 90
3

Think about using sequence comprehensions, too. They're useful for dealing with XML, particularly if you need complicated conditions.

For the simple case:

for {
  c <- rssFeed \\ "@date"
} yield c

Gives you the date attribute from everything in rssFeed.

But if you want something more complex:

val rssFeed = <rss version="2.0">
                <channel>
                  <item>
                    <y:c date="AA"></y:c>
                    <y:c date="AB"></y:c>
                    <y:c date="AC"></y:c>
                  </item>
                </channel>
              </rss>

val sep = "\n----\n"

for {
  channel <- rssFeed \ "channel"
  item <- channel \ "item"
  y <- item \ "c"
  date <- y \ "@date" if (date text).equals("AA")
} yield {
  val s = List(channel, item, y, date).mkString(sep)
  println(s)
}

Gives you:

    <channel>
                        <item>
                          <y:c date="AA"></y:c>
                          <y:c date="AB"></y:c>
                          <y:c date="AC"></y:c>
                        </item>
                      </channel>
    ----
    <item>
                          <y:c date="AA"></y:c>
                          <y:c date="AB"></y:c>
                          <y:c date="AC"></y:c>
                        </item>
    ----
    <y:c date="AA"></y:c>
    ----
    AA
James Moore
  • 8,636
  • 5
  • 71
  • 90