Parse out key:values in dictionary nested in MRSS feed using Python Feedparser

Question

I've looked through the Python feedparser documentation and done enough Googling, but not finding any example feeds that look like what I'm working with:

http://smrss.neulion.com/u/nhl/mrss/sights-and-sounds/vod.xml

What I'm trying to access is the mp4 URLs in the media:group --> media:content element in each item in the feed.

Here's my code so far:

#! /usr/bin/python
# -*- coding: utf-8 -*-

import feedparser

d = feedparser.parse('http://smrss.neulion.com/u/nhl/mrss/sights-and-sounds/vod.xml')

for index,item in enumerate(d.entries):
    if index >= 4:
        print item.title
        print item.media_content
        print item.summary

What prints out to Terminal for item.media_content is:

[{'duration': u'150', 'url': u'http://smrss.neulion.com/spmrss/s/nhl/vod/flv/2015/04/19/811204_20150418_PIT_NYR_WIRELESS_1800_sd.mp4', 'type': u'video_sd.mp4'}]

This is a dictionary inside of a list, yes? What would be the best way to iterate through this dictionary in my for loop so I can extract the value at the 'url' key?

score 1 · Accepted Answer · answered May 01 '15 at 18:58

1

if item.media_content is always a list with one dictionary, just do this:

for key, val in item.media_content[0].iteritems():
    print key, val

answered May 01 '15 at 18:58

Julien Spronck

15,069
4
47
55

Thank you very much for breaking that down! I was missing the [0] — I'm assuming we need to tell Python which index the list is at, even if there's only one media_content list per item in the feed? – AdjunctProfessorFalcon May 01 '15 at 19:14
You're welcome :-) Indeed, you first tell Python to get the first item of the list. – Julien Spronck May 01 '15 at 19:42

score 0 · Answer 2 · answered May 01 '15 at 19:13

I'd recommend using BeautifulSoup :

import urllib
from bs4 import BeautifulSoup
url = "http://smrss.neulion.com/u/nhl/mrss/sights-and-sounds/vod.xml"
vod = urllib.urlopen(url)



In [1752]: [i['url'] for i in soup.findAll('media:content') if i.has_attr('url')]
Out[1752]: 
['http://smrss.neulion.com/spmrss/s/nhl/vod/flv/2015/04/30/817293_C150008B_20150428_ROUND_ONE_WIRELESS_RECAP_SHORT_5000_sd.mp4',
 'http://smrss.neulion.com/spmrss/s/nhl/vod/flv/2015/04/28/816995_20150427_NHL_Playoff_Access_NYI_WSH_GM7_5000_sd.mp4',
 'http://smrss.neulion.com/spmrss/s/nhl/vod/flv/2015/04/26/816230_20150426_WIRELESS_RECAP_5000_sd.mp4',
 'http://smrss.neulion.com/spmrss/s/nhl/vod/flv/2015/04/25/815823_20150425_WIRELESS_GM5_OTT_5000_sd.mp4',

Parse out key:values in dictionary nested in MRSS feed using Python Feedparser

2 Answers2