22

I am using PHP and simpleXML to read the following rss feed:

http://feeds.bbci.co.uk/news/england/rss.xml

I can get most of the information I want like so:

$rss = simplexml_load_file('http://feeds.bbci.co.uk/news/england/rss.xml');

echo '<h1>'. $rss->channel->title . '</h1>';

foreach ($rss->channel->item as $item) {
   echo '<h2><a href="'. $item->link .'">' . $item->title . "</a></h2>";
   echo "<p>" . $item->pubDate . "</p>";
   echo "<p>" . $item->description . "</p>";
} 

But how would I output the thumbnail image that is in the following tag:

<media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/51078000/jpg/_51078953_226alanpotbury.jpg"/>  
Joel Christophel
  • 2,604
  • 4
  • 30
  • 49
geoffs3310
  • 13,640
  • 23
  • 64
  • 85

2 Answers2

21

As you already know, SimpleXML lets you select an node's child using the object property operator -> or a node's attribute using the array access ['name']. It's great, but the operation only works if what you select belongs to the same namespace.

If you want to "hop" from a namespace to another, you can use the children() or attributes() methods. In your case, this is made a bit trickier because you have <item/> in the global namespace, the node you're looking for is in the "media" namespace* and then the attributes are in the global namespace again (they are not prefixed.) So using the normal object/array notation you'll have to "hop" twice:

foreach ($rss->channel->item as $item)
{
    // we load the attributes into $thumbAttr
    // you can either use the namespace prefix
    $thumbAttr = $item->children('media', true)->thumbnail->attributes();

    // or preferably the namespace name, read note below for an explanation
    $thumbAttr = $item->children('http://search.yahoo.com/mrss/')->thumbnail->attributes();

    echo $thumbAttr['url'];
}

*Note

I refer to the namespace as the "media" namespace but that's not really correct. The namespace name is http://search.yahoo.com/mrss/, and "media" is just a prefix, some sort of alias if you will. What's important to keep in mind is that http://search.yahoo.com/mrss/ is the real name of the namespace. At some point, your RSS provider might decide to change the prefix to, say, "yahoo" and your script will stop working if your script refers to the "media" prefix. However, if you use the namespace name, it will keep working no matter the prefix.

Josh Davis
  • 28,400
  • 5
  • 52
  • 67
  • The script that u have specified is the way to read xml and put it in our site. If suppose i have to trigger this script to read xml only if there ia change in RSS feed(xml content), how to do that? – Viswa Feb 11 '12 at 04:13
  • This is unrelated to XML, please post it as a new question so that it can be answered properly. – Josh Davis Feb 11 '12 at 07:02
6

SimpleXML is pretty bad at handling namespaces. You have two choices: The simplest hack is to simply read the contents of the feed into a string and replace the namespaces;

$feed = file_get_contents('http://feeds.bbci.co.uk/news/england/rss.xml');
$feed = str_replace('<media:', '<', $feed);

$rss = simplexml_load_string($feed);
...

Now you can access the element thumbnail directly.

The more elegant (not really) method is to find out what URI the namespace uses. If you look at the source code for http://feeds.bbci.co.uk/news/england/rss.xml you see that it points to http://search.yahoo.com/mrss/.

Now you can use this URI in the children() method of a SimpleXMLElement to get the contents of the media:thumbnail element;

$rss = simplexml_load_file('http://feeds.bbci.co.uk/news/england/rss.xml');

foreach ($rss->channel->item as $item) {
    $media = $item->children('http://search.yahoo.com/mrss/');
    ...
}
Björn
  • 29,019
  • 9
  • 65
  • 81
  • 6
    -1 for suggesting naive string manipulation as any kind of option compared to the built-in namespace handling. Not sure why you think the `children` method is "pretty bad" and "not elegant" - you have to tell SimpleXML *somewhere* which namespace you want; you can even (since PHP 5.3) use the XML prefix (`->children('media', true)`), although the URI is the only identifier guaranteed not to change if the XML is generated slightly differently. – IMSoP May 01 '13 at 16:36
  • 1
    It's not the best idea (see IMSoP comment above), but since that's the accepted answer, here is the correct way to do it: `str_replace(array(' – SuN Jun 06 '14 at 09:05
  • 1
    @sun: That's not much better. It will break. This doesn't even do any *bare* tag parsing. – hakre Jun 08 '14 at 14:58