I am executing these lines:
import feedparser
url = 'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/feed.xml'
feed = feedparser.parse(url)
items = feed['items']
print items[0]['links'][1]['href]
Which use this feedparser module. Here is a sample chunk of the RSS feed in question:
<item>
<title>More Android Annotations</title>
<link>http://youtu.be/77pPceVicNI</link>
<description><![CDATA[Walkthrough that goes a little bit more indepth to show you the things that <a href="http://androidannotations.org">AndroidAnnotations</a> can do for you as an application developer. <br /><a href="https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4">Direct download link <i>(rightclick and choose save as)</i></a>]]></description>
<image>
<url>https://dl.dropboxusercontent.com/u/5724095/images/Githubpics/moreAnnotations.png</url>
<link>https://github.com/FoamyGuy/StackSites</link>
<title>More Android Annotations</title>
</image>
</item>
I am trying to get the https://github.com/FoamyGuy/StackSites
portion of the item. On my local pc (win7 python 2.6) this works correctly. But when I execute the same lines in a console on pythonanywhere.com instead of my github link I get https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4
which is the mp4 link included near the end of the CDATA in the description.
On both machines items[0]['links']
contains only 2 elements (indexes 0 and 1) but the values for the string at index 1 are different on the two machines. Why would feedparser be giving me back different values on one machine than it is on another?
I have printed the entire items[0]
on pythonanywhere and my github link is not included in it at all. Is there some parameter I can use to alter the way the feed gets parsed so I can correctly get the github link out of it?
Is there some other feed parsing module that would work better for me and hopefully be more consistent across machines?