1

I'm using beautifulsoup and lxml to parse a html page. In the beginning I'm using the following code

for item in soup.find_all("td", { "class" : re.compile(r"^(s|sb)$") }):
    data_item = (''.join(str(item.find(text=True)))).strip().lower();

I got the following error

 data_item = (''.join(str(item.find(text=True)))).strip().lower();
 UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 24: ordinal not in range(128)

From this thread, I changed my code to the following

for item in soup.find_all("td", { "class" : re.compile(r"^(s|sb)$") }):
      data_item = u' '.join(item.find(text=True)).encode('utf-8').strip().lower();

I got the following error

data_item = u' '.join(item.find(text=True)).encode('utf-8').strip();
TypeError

What should I do?

Community
  • 1
  • 1
  • 1
    What is the actual `TypeError` you get, including traceback? – Katriel Dec 12 '12 at 15:38
  • It is the error I got. There is no other information. File "get_financials.py", line 58, in get_single_page data_item = u' '.join(item.find(text=True)).encode('utf-8').strip().lower(); TypeError –  Dec 12 '12 at 16:04
  • Perhaps you should split up that line so you can find out what part of it is causing the exception. You could certainly do the encoding, stripping and lower casing in a separate step, at least, and probably other bits as well. – Blckknght Dec 12 '12 at 16:31

0 Answers0