I'm aware that unicode was changed to str in python 3 but I keep getting the same issue no matter how I write this code, can anyone tell me why?
I'm using boilerpipe for a specific set of webcrawls:
for urls in allUrls:
fileW = open('article('+ str(counter)+')', 'w')
articleDate = Article(urls)
articleDate.download()
articleDate.parse()
print(articleDate.publish_date)
fileW.write(str(Extractor(extractor='ArticleExtractor', url=urls).getText() + "\n\n\n" + str(articleDate.publish_date)+"\n\n\n"))
fileW.close
counter +=1
error:
Traceback (most recent call last):
File "/Users/Adrian/anaconda3/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 45, in __init__
self.data = unicode(self.data, encoding)
NameError: name 'unicode' is not defined
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "webcrawl.py", line 26, in <module>
fileW.write(str(Extractor(extractor='ArticleExtractor', url=urls).getText() + "\n\n\n" + str(articleDate.publish_date)+"\n\n\n"))
File "/Users/Adrian/anaconda3/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 47, in __init__
self.data = self.data.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte