I am getting the following error :
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 587: ordinal not in range(128)
My code:
import os
from bs4 import BeautifulSoup
do = dir_with_original_files = 'C:\Users\Me\Directory'
dm = dir_with_modified_files = 'C:\Users\Me\Directory\New'
for root, dirs, files in os.walk(do):
for f in files:
if f.endswith('~'): #you don't want to process backups
continue
original_file = os.path.join(root, f)
mf = f.split('.')
mf = ''.join(mf[:-1])+'_mod.'+mf[-1] # you can keep the same name
# if you omit the last two lines.
# They are in separate directories
# anyway. In that case, mf = f
modified_file = os.path.join(dm, mf)
with open(original_file, 'r') as orig_f, \
open(modified_file, 'w') as modi_f:
soup = BeautifulSoup(orig_f.read())
for t in soup.find_all('td', class_='test'):
t.string.wrap(soup.new_tag('h2'))
# This is where you create your new modified file.
modi_f.write(soup.prettify())
This code is iterating over a directory, and for each file finds all of the tds of class test and adds h2 tags to the text within the td. So previously, it would have been :
<td class="test"> text </td>
After running this program, a new file will be created with :
<td class="test"> <h2>text</h2> </td>
Or this is how I would like it to function. Unfortunately, currently, I am getting the error described above. I believe this is because I am parsing some text which includes accented characters and is written in Spanish, with special Spanish characters.
What can I do to fix my issue?