- I am trying to use feedparser to parse text which I download using asyncio aiohttp library
- The feed text is available HERE (Large document, hence not pasting here)
- The documentation of feedparser.parse method mentions that you should not send an untrusted string directly to it HERE on GitHub
So here is my code where I am trying to wrap it into StringIO class
import feedparser
import io
def read():
import os
name = os.path.join(os.getcwd(), 'extras', 'feeds',
'zycrypto.com_1596955288219')
f = open(name, "r")
text = f.read()
f.close()
return text
text = read()
parsed = feedparser.parse(io.StringIO(text))
for i in parsed.entries:
print(i.summary, '\n')
However I keep getting this error
Traceback (most recent call last):
File "./server/python/test.py", line 14, in <module>
parsed = feedparser.parse(io.StringIO(text))
File "/Users/zup/.local/share/virtualenvs/myapp_v3-kUGnE3_O/lib/python3.7/site-packages/feedparser.py", line 3922, in parse
data, result['encoding'], error = convert_to_utf8(http_headers, data)
File "/Users/zup/.local/share/virtualenvs/myapp_v3-kUGnE3_O/lib/python3.7/site-packages/feedparser.py", line 3574, in convert_to_utf8
xml_encoding_match = RE_XML_PI_ENCODING.match(tempdata)
TypeError: cannot use a bytes pattern on a string-like object
- How do I pass untrusted text to the Python feedparser.parse method to make the sanitizer work on it? My feed has script tags which have not been removed. Thank you in advance