0

I am trying to parse the following RSS feed from NOAA: http://www.nhc.noaa.gov/rss_examples/gis-ep-20130530.xml

It works great except for this section:

    <item>
    <title>Summary - Remnants of BARBARA (EP2/EP022013)</title>
    <guid isPermaLink="false">summary-ep022013-201305302032</guid>
    <pubDate>Thu, 30 May 2013 20:32:00 GMT</pubDate>
    <author>nhcwebmaster@noaa.gov (NHC Webmaster)</author>
    <link>
    http://www.nhc.noaa.gov/text/refresh/MIATCPEP2+shtml/302031.shtml
    </link>
    <description>
    ...BARBARA DISSIPATES... ...THIS IS THE LAST ADVISORY... As of 2:00 PM PDT Thu May         30 the center of BARBARA was located at 18.5, -94.5 with movement NNW at 3 mph. The minimum         central pressure was 1005 mb with maximum sustained winds of about 25 mph.
    </description>
    <gml:Point>
    <gml:pos>18.5 -94.5</gml:pos>
    </gml:Point>
    **<nhc:Cyclone>
            <nhc:center>18.5, -94.5</nhc:center>
            <nhc:type>REMNANTS OF</nhc:type>
            <nhc:name>BARBARA</nhc:name>
            <nhc:wallet>EP2</nhc:wallet>
            <nhc:atcf>EP022013</nhc:atcf>
            <nhc:datetime>2:00 PM PDT Thu May 30</nhc:datetime>
            <nhc:movement>NNW at 3 mph</nhc:movement>
            <nhc:pressure>1005 mb</nhc:pressure>
            <nhc:wind>25 mph</nhc:wind>
            <nhc:headline>
            ...BARBARA DISSIPATES... ...THIS IS THE LAST ADVISORY...
            </nhc:headline>
    </nhc:Cyclone>**
    </item>

The section in BOLD is not being parsed by feedparser. Is there a way to ensure custom tags are included in the parsing?

Verification:

>>> import feedparser
>>> f = feedparser.parse('http://www.nhc.noaa.gov/rss_examples/gis-ep-20130530.xml')
>>> f.entries[1]['description']
u'Shapefile last updated Thu, 30 May 2013 15:03:01 GMT'
>>> f.entries[1]['nhc_cyclone']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "feedparser.py", line 375, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'nhc_cyclone'

Output of >>> f: https://gist.github.com/mustafa0x/6199452

vgoff
  • 10,980
  • 3
  • 38
  • 56
code base 5000
  • 3,812
  • 13
  • 44
  • 73

1 Answers1

2

In the current feed XML, you will see that the custom tags are actually in entry 3, not entry 1. In addition, while feedparser can consume the custom tags, they are renamed. This is described in http://pythonhosted.org/feedparser/namespace-handling.html .

Try this (I am using version 5.1.2 of feedparser):

>>> f.entries[3].title  
u'Summary - Remnants of BARBARA (EP2/EP022013)'  
>>> f.entries[3].nhc_center  
u'18.5, -94.5'  
>>> f.entries[3].nhc_type  
u'REMNANTS OF'  
>>> f.entries[3].nhc_name  
u'BARBARA'

...and similarly for the other children of nhc:Cyclone.

Glenn
  • 537
  • 3
  • 10
  • Thanks for your answer. It's actually entry 4, not 3. – mustafa.0x Aug 12 '13 at 06:53
  • Very strange..the feed seems to have changed even though it is an example with an old date (since 3 worked before). At any rate, glad the answer worked for you. – Glenn Aug 12 '13 at 19:14
  • Your code works; arrays are 0-indexed. Yeah, it works and I appreciate that, but I later found that the real problem I'm having is an unfixed bug: https://code.google.com/p/feedparser/issues/detail?id=256 – mustafa.0x Aug 12 '13 at 19:40