Python feedparser return URL of first media item in first entry

Question

I'm working with python for the first time and I am a bit stuck.

Using feedparser to parse a RSS feed, I want to get the URL of the first media item of entry 0 and load it into a variable.

The below seems to work, but I have to hit enter twice to run it and it returns the URLs for ALL media items in the entry 0, where I only want the first (16x9) image URL.

>>> import feedparser
>>> d = feedparser.parse(http://www.abc.net.au/news/feed/45910/rss)
>>> for content in d.entries[0].media_content: print content['url']

-link to where i got the code above

RSS XML:

            <media:group>
        <media:description>French fighter jets take off to drop bombs on the Islamic State stronghold of Raqqa in Syria. (Supplied)</media:description>
        <media:content url="http://www.abc.net.au/news/image/6943630-16x9-2150x1210.jpg" medium="image" type="image/jpeg" width="2150" height="1210"/>
          <media:content url="http://www.abc.net.au/news/image/6943630-4x3-940x705.jpg" medium="image" type="image/jpeg" width="940" height="705"/>
          <media:content url="http://www.abc.net.au/news/image/6943630-3x2-940x627.jpg" medium="image" type="image/jpeg" width="940" height="627" isDefault="true"/>
          <media:content url="http://www.abc.net.au/news/image/6943630-3x4-940x1253.jpg" medium="image" type="image/jpeg" width="940" height="1253"/>
          <media:content url="http://www.abc.net.au/news/image/6943630-1x1-1400x1400.jpg" medium="image" type="image/jpeg" width="1400" height="1400"/>
          <media:thumbnail url="http://www.abc.net.au/news/image/6943630-4x3-140x105.jpg" width="140" height="105"/>
        </media:group>

Looks like this when run in python:

>>> for content in d.entries[0].media_content: print content['url']
... 
http://www.abc.net.au/news/image/6943630-16x9-2150x1210.jpg
http://www.abc.net.au/news/image/6943630-4x3-940x705.jpg
http://www.abc.net.au/news/image/6943630-3x2-940x627.jpg
http://www.abc.net.au/news/image/6943630-3x4-940x1253.jpg
http://www.abc.net.au/news/image/6943630-1x1-1400x1400.jpg
>>>

score 2 · Accepted Answer · answered Nov 16 '15 at 01:57

2

Quick answer:

url = d.entries[0].media_content[0]['url']

d.entries[n].media_content is a list full of dicts, so you can just get the first item in that list and store the value at "url" in a variable.

Here's how it looks in the Python shell:

>>> import feedparser
>>> d = feedparser.parse("http://www.abc.net.au/news/feed/45910/rss")
>>> url = d.entries[0].media_content[0]['url']
>>> print url
http://www.abc.net.au/news/image/6943798-16x9-2150x1210.jpg

answered Nov 16 '15 at 01:57

Simon Andrews

153
7

THANKYOU! I started playing with adding another index but i must have been making a simple error somewhere. – Quantum_Kernel Nov 16 '15 at 02:04
No problem! You'll probably also want to make sure it's the 16x9 image with some regex since you can't guarantee the RSS will always serve that image first, but that should be easy enough. – Simon Andrews Nov 16 '15 at 02:51

Python feedparser return URL of first media item in first entry

1 Answers1