inserting brokenly formatted text into PostgreSQL table despite encoding errors with \copy

Question

I have a text field (utf8) that misbehaves a little. originally fetched it includes the character sequence \u00e2\u0080\u0099 which is basically an apostrophe in another encoding. I have decided to maintain this corrupted state and not solve it despite the fact I have found a few solutions online on how to reinterpret these kinds of errors in my text field.

So I just want to insert the raw data as is. I have tried 2 ways to insert this row.

Using python with peewee (an orm library). with everything configured correctly this method actually works, the data is inserted and there is a row in the database.
selecting the column yield: don├ó\u0080\u0099t which I am ok with keeping.

So far so good.

Writing a python script that prints tab delimited text and using \copy
Annoyingly this method does not work and returns the following error:

ERROR:  invalid byte sequence for encoding "UTF8": 0x80
CONTEXT:  COPY comment, line 1: "'donâ\x80\x99t'"

(when printing the data from the python script to console it shows up as donâ\x80\x99t)

Thus clearly there is a difference between what peewee does and my naive printing of the string from python (peewee and print receive the same string as input).

How do I encode this string correctly so I can use \copy to populate the row?

Take a look at the answer from [Wim](https://stackoverflow.com/a/66815577/9267296) on this [question](https://stackoverflow.com/q/27996448/9267296). — Edo Akse, May 29 '22 at 02:45
I know "You're doing it wrong" answers are a pain, but really you should fix the encoding problems as early on as possible, ie. I'm challenging your decision to insert the raw data as-is. It's like you detect some loose part on your car, but you decide to fix it later. Then you only ask for help after it fell off during the next ride. You're more likely to get help for an easier problem (properly attaching the part) than a cumbersome one (looking for the lost part on the side of the road). — lenz, May 29 '22 at 08:02
I do understand that but I seriously just want to insert the raw data as is despite the problem — Yorai Levi, May 29 '22 at 09:20

inserting brokenly formatted text into PostgreSQL table despite encoding errors with \copy

0 Answers0