0

I made the mistake of accidentally using non-ascii characters in a form that was submitted into a database using SQLAlchemy, running on Flask. Basically, rather than using the ASCII hyphen , I used the unicode en-dash . I am trying to now go back and replace all occurrences of the en-dash with a hyphen in my database.

Let's say I have a users table, and the column I'm trying to change is called occupation. I'm able to figure out which entries in my database have the invalid character, because when I run:

User.query.get(id)

if the user has an invalid ASCII character, it returns

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 167: ordinal not in range(128)

So how can I go about replacing all occurrences of the en-dash with a hyphen in the occupation column for all rows in my DB?

Jason B
  • 7,097
  • 8
  • 38
  • 49
  • Have you tried doing something like this to perform queries? http://stackoverflow.com/questions/10819192/sqlalchemy-result-for-utf-8-column-is-of-type-str-why – pgorsira Jul 25 '14 at 20:27
  • Then you can just replace the en-dash in each record and persist the change. – pgorsira Jul 25 '14 at 20:27

1 Answers1

0

I was able to fix this by running a script over all entries in my database, and replacing the ones with faulty characters.

from user.models import *
for u in User.query.all():
    # \u2013 is unicode for en-dash
    if u"\u2013" in u.occupation:
        # replace with normal hyphen
        u.occupation = u.occupation.replace(u"\u2013", "-")
        db.session.commit()
Jason B
  • 7,097
  • 8
  • 38
  • 49