For a project, I want to use feedparser. Basicly I got it working.
In the documentation section about sanitization is described, that not all content types are sanitized. How can I force feedparser to do this on all content types?
For a project, I want to use feedparser. Basicly I got it working.
In the documentation section about sanitization is described, that not all content types are sanitized. How can I force feedparser to do this on all content types?
I think the feedparser doc page you referenced gives good advice:
*It is recommended that you check the content type in e.g. entries[i].summary_detail.type. If it is text/plain then it has not been sanitized (and you should perform HTML escaping before rendering the content).*
import cgi
import feedparser
d = feedparser.parse('http://rss.slashdot.org/Slashdot/slashdot')
# iterate through entries. If the type is not text/html, HTML clean it
for entry in d.entries:
if entry.summary_detail.type != 'text/html':
print cgi.escape(entry.summary)
else:
print entry.summary
Of course, there are dozens of ways you can iterate through the entries depending on what you want to do with them once they are clean.