I'm using beautifulsoup and lxml to parse a html page. In the beginning I'm using the following code
for item in soup.find_all("td", { "class" : re.compile(r"^(s|sb)$") }):
data_item = (''.join(str(item.find(text=True)))).strip().lower();
I got the following error
data_item = (''.join(str(item.find(text=True)))).strip().lower();
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 24: ordinal not in range(128)
From this thread, I changed my code to the following
for item in soup.find_all("td", { "class" : re.compile(r"^(s|sb)$") }):
data_item = u' '.join(item.find(text=True)).encode('utf-8').strip().lower();
I got the following error
data_item = u' '.join(item.find(text=True)).encode('utf-8').strip();
TypeError
What should I do?