0

I am executing these lines:

import feedparser
url = 'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/feed.xml'
feed = feedparser.parse(url)
items = feed['items']
print items[0]['links'][1]['href]

Which use this feedparser module. Here is a sample chunk of the RSS feed in question:

    <item>
    <title>More Android Annotations</title>
    <link>http://youtu.be/77pPceVicNI</link>
    <description><![CDATA[Walkthrough that goes a little bit more indepth to show you the things that <a href="http://androidannotations.org">AndroidAnnotations</a> can do for you as an application developer. <br /><a href="https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4">Direct download link <i>(rightclick and choose save as)</i></a>]]></description>
    <image>
        <url>https://dl.dropboxusercontent.com/u/5724095/images/Githubpics/moreAnnotations.png</url>
        <link>https://github.com/FoamyGuy/StackSites</link>
        <title>More Android Annotations</title>
    </image>
  </item>

I am trying to get the https://github.com/FoamyGuy/StackSites portion of the item. On my local pc (win7 python 2.6) this works correctly. But when I execute the same lines in a console on pythonanywhere.com instead of my github link I get https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4 which is the mp4 link included near the end of the CDATA in the description.

On both machines items[0]['links'] contains only 2 elements (indexes 0 and 1) but the values for the string at index 1 are different on the two machines. Why would feedparser be giving me back different values on one machine than it is on another?

I have printed the entire items[0] on pythonanywhere and my github link is not included in it at all. Is there some parameter I can use to alter the way the feed gets parsed so I can correctly get the github link out of it?

Is there some other feed parsing module that would work better for me and hopefully be more consistent across machines?

hwjp
  • 15,359
  • 7
  • 71
  • 70
FoamyGuy
  • 46,603
  • 18
  • 125
  • 156
  • Could it be some kind of geolocation thing? the PythonAnywhere servers are in the US, maybe you live somewhere, and the server returns different results based on IP? – hwjp Oct 18 '13 at 12:10
  • I live in the US, (and I think pythonanywhere is UK based). But either way it shouldn't be a geolocation issue because the xml in question is under my control and shouldn't change based on region. – FoamyGuy Oct 18 '13 at 13:14

1 Answers1

0

Having experimented with your feed, it looks like each item has two entries in "links", but it looks like they are consistently different -- one will have rel="alternate", and one will have rel="enclosure"

In [8]: items[0]['links']
Out[8]:
[{'href': u'http://youtu.be/NL7szHeEiCs',
  'rel': u'alternate',
  'type': u'text/html'},
 {u'href': u'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/ButtonExample.mp4',
  'rel': u'enclosure'}]

In [9]: items[1]['links']
Out[9]:
[{'href': u'http://youtu.be/77pPceVicNI',
  'rel': u'alternate',
  'type': u'text/html'},
 {u'href': u'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4',
  'rel': u'enclosure'}]

So, would you be able to use that to get the one you want?

def get_alternate_link(item):
    for link in item.links:
        if link.get('rel') == 'alternate':
            return link.get('href')
hwjp
  • 15,359
  • 7
  • 71
  • 70