2

I'm trying to use feedparser to retrieve some specific information from feeds, but also retrieve the raw XML of each entry (ie. elements for RSS and for Atom), and I can't see how to do that. Obviously I could parse the XML by hand, but that's not very elegant, would require separate support for RSS and Atom, and I imagine it could fall out of sync with feedparser for ill-formed feeds. Is there a better way?

Thanks!

Dan Lowe
  • 51,713
  • 20
  • 123
  • 112
a3nm
  • 8,717
  • 6
  • 31
  • 39

1 Answers1

2

I'm the current developer of feedparser. Currently, one of the ways you can get that information is to monkeypatch feedparser._FeedParserMixin (or edit a local copy of feedparser.py). The methods you'll want to modify are:

  • feedparser._FeedParserMixin.unknown_starttag
  • feedparser._FeedParserMixin.unknown_endtag

At the top of each method you can insert a callback to a routine of your own that will capture the elements and their attributes as they're encountered by feedparser.

Kurt McKee
  • 1,410
  • 13
  • 17
  • 1
    Thanks a lot! That's something useful, but what I intended to do was to retrieve the full XML (including known items) for each item as a way to store them and serve them back in an aggregated feed (for instance). [Actually, I realize this wouldn't be very convenient because items could be in various formats (RSS, Atom, etc.). Maybe it would be a useful addition to feedparser to have a way to generate back XML for items in the various formats that feedparser can parse...] – a3nm Nov 05 '11 at 18:12