0

I wrote the following python script for retrieving the information from Yahoo Finance website and store it in a file. Following is the script:

import urllib.request
from bs4 import BeautifulSoup
in_data = open('list_of_companies.txt','r',encoding='utf-8', errors='ignore')
for line in in_data:
    page = urllib.request.urlopen('http://finance.yahoo.com/rss/headline?s='+line.strip())
    page = page.read()
    page = page.decode('utf-8','ignore')
    soup = BeautifulSoup(page)
    ans = line[0:len(line.strip())]
    ans += '.txt'
    f1 = open(ans,'w')
    f1.write(soup.prettify())
    f1.close()

It will take input symbol from list_of_companies.txt and store it in a file with that name. However, when i am running this script, this is giving me following error:

Traceback (most recent call last):
  File "C:\Users\Darshil Babel\Desktop\NewsContent\s.py", line 12, in <module>
    f1.write(soup.prettify())
  File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03cb' in position 1556: character maps to <undefined>

I am using urllib and BeautifulSoup module for this purpose. Can somebody help me with this?

stalk
  • 11,934
  • 4
  • 36
  • 58
Darshil Babel
  • 145
  • 2
  • 13

1 Answers1

1

You opened your output file without specifying a Unicode-compatible encoding. Try

f1 = open(ans,'w', encoding='utf-8')
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561