I have a text field (utf8) that misbehaves a little. originally fetched it includes the character sequence \u00e2\u0080\u0099
which is basically an apostrophe in another encoding. I have decided to maintain this corrupted state and not solve it despite the fact I have found a few solutions online on how to reinterpret these kinds of errors in my text field.
So I just want to insert the raw data as is. I have tried 2 ways to insert this row.
- Using python with peewee (an orm library).
with everything configured correctly this method actually works, the data is inserted and
there is a row in the database.
selecting the column yield:donâ\u0080\u0099t
which I am ok with keeping.
So far so good.
- Writing a python script that prints tab delimited text and using \copy
Annoyingly this method does not work and returns the following error:
ERROR: invalid byte sequence for encoding "UTF8": 0x80
CONTEXT: COPY comment, line 1: "'donâ\x80\x99t'"
(when printing the data from the python script to console it shows up as donâ\x80\x99t
)
Thus clearly there is a difference between what peewee does and my naive printing of the string from python (peewee and print receive the same string as input).
How do I encode this string correctly so I can use \copy to populate the row?