0

when using feedparser for reading rss feeds such as business insider's at this url:

businessinsider.com/rss

feedparser in python3 seems to properly handle some of the attributes for each entry in the rss feed, 'transform' others, and ignore/delete others. i haven't the faintest idea why.

  1. it seems to properly handle: title (makes sense) link (makes sense) ... and also properly handle a few other attr's that make sense i.e. are 'in the feed'. ok, great ...

  2. but it is completely missing: description ... is there a reason it ignores/deletes/hides that attr in the feed? why?

  3. and then it populates 'phantom' fields as attributes such as: 'summary', 'summary_detail', ...and others. is it doing some kind of transformations from the description in the feed to these synthetic summary fields behind the scenes somewhere? is there a reason it hides/deletes/ignores/mishandles description?

i tried reading the documentation but cannot find an explanation for this. whether it's some kind of setting or argument i am passing to feedparser, or something it does as a feature automatically, or ... i am confused

thanks

10mjg
  • 573
  • 1
  • 6
  • 18

1 Answers1

0

I think the feedparser documentation does hit on the answer to my question. At this page:

https://pythonhosted.org/feedparser/reference-entry-summary.html

It says:

entries [i] summary 

Comes from

/atom10:feed/atom10:entry/atom10:summary
/atom03:feed/atom03:entry/atom03:summary
/rss/channel/item/description
/rss/channel/item/dc:description
/rdf:RDF/rdf:item/rdf:description
/rdf:RDF/rdf:item/dc:description

So I guess that says it all. In the feed I posted, the summary attribute is indeed the rss/channel/item/description field.

Now I have to read about sanitization, bc I would have thought it would just come through sort of as text, not as html, once feedparser digests it... but that is a separate issue, I suppose ...

10mjg
  • 573
  • 1
  • 6
  • 18