I have a strange problem with converting special characters from HTML. I have a Django project where text is stored HTML-encoded in a MySQL database. This is necessary, because I don't want to lose any formatting of the text.
In a preliminary step I must do operational things on the text like calculating positions, so I need to convert it first and clear it from all HTML-Tags. This is done by BeautifulSoup:
convertedText = str(BeautifulSoup(text.text, convertEntities=BeautifulSoup.HTML_ENTITIES))
convertedText = ''.join(BeautifulSoup(convertedText).findAll(text=True))
By working on my Django-default test-server everything works fine, but when I run it on my production server there are strange behaviors when converting special characters.
An example:
Test server
MySQL-Query gives me: <p>bassverstärker</p>
is correctly converted to: bassverstärker
Production server
MySQL-Query gives me: <p>bassverstärker</p>
This is is wrongly converted to: bassverst\ucc44rker
Somehow the ä
is converted into \ucc44
and this results in a wrong character.
My configuration:
Test server:
- Django build-in solution (
python manage.py runserver
) - BeautifulSoup 3.2.1
- Python 2.6.5
- Ubuntu 2.6.32-43-generic
Production server:
- Cherokee 1.2.101
- BeautifulSoup 3.2.1
- python 2.7.3
- Ubuntu 3.2.0-32-generic
Because I don't know at which level the error occurs, I would like to ask if anybody can help me with this. Many thanks in advance.