python feedparser inconsistent items

Question

I am executing these lines:

import feedparser
url = 'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/feed.xml'
feed = feedparser.parse(url)
items = feed['items']
print items[0]['links'][1]['href]

Which use this feedparser module. Here is a sample chunk of the RSS feed in question:

    <item>
    <title>More Android Annotations</title>
    <link>http://youtu.be/77pPceVicNI</link>
    <description><![CDATA[Walkthrough that goes a little bit more indepth to show you the things that <a href="http://androidannotations.org">AndroidAnnotations</a> can do for you as an application developer. <br /><a href="https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4">Direct download link <i>(rightclick and choose save as)</i></a>]]></description>
    <image>
        <url>https://dl.dropboxusercontent.com/u/5724095/images/Githubpics/moreAnnotations.png</url>
        <link>https://github.com/FoamyGuy/StackSites</link>
        <title>More Android Annotations</title>
    </image>
  </item>

I am trying to get the https://github.com/FoamyGuy/StackSites portion of the item. On my local pc (win7 python 2.6) this works correctly. But when I execute the same lines in a console on pythonanywhere.com instead of my github link I get https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4 which is the mp4 link included near the end of the CDATA in the description.

On both machines items[0]['links'] contains only 2 elements (indexes 0 and 1) but the values for the string at index 1 are different on the two machines. Why would feedparser be giving me back different values on one machine than it is on another?

I have printed the entire items[0] on pythonanywhere and my github link is not included in it at all. Is there some parameter I can use to alter the way the feed gets parsed so I can correctly get the github link out of it?

Is there some other feed parsing module that would work better for me and hopefully be more consistent across machines?

Could it be some kind of geolocation thing? the PythonAnywhere servers are in the US, maybe you live somewhere, and the server returns different results based on IP? — hwjp, Oct 18 '13 at 12:10
I live in the US, (and I think pythonanywhere is UK based). But either way it shouldn't be a geolocation issue because the xml in question is under my control and shouldn't change based on region. — FoamyGuy, Oct 18 '13 at 13:14

score 0 · Answer 1 · answered Oct 18 '13 at 14:52

Having experimented with your feed, it looks like each item has two entries in "links", but it looks like they are consistently different -- one will have rel="alternate", and one will have rel="enclosure"

In [8]: items[0]['links']
Out[8]:
[{'href': u'http://youtu.be/NL7szHeEiCs',
  'rel': u'alternate',
  'type': u'text/html'},
 {u'href': u'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/ButtonExample.mp4',
  'rel': u'enclosure'}]

In [9]: items[1]['links']
Out[9]:
[{'href': u'http://youtu.be/77pPceVicNI',
  'rel': u'alternate',
  'type': u'text/html'},
 {u'href': u'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4',
  'rel': u'enclosure'}]

So, would you be able to use that to get the one you want?

def get_alternate_link(item):
    for link in item.links:
        if link.get('rel') == 'alternate':
            return link.get('href')

I can test it later today. I'll let you know. – FoamyGuy Oct 18 '13 at 15:26 — FoamyGuy, Oct 18 '13 at 15:26

python feedparser inconsistent items

1 Answers1