4

I am importing data from a file to PostgreSQL database table using COPY FROM. Some of the strings in my file contain hex characters (mostly \x0d and \x0a) and I'd like them to be converted into regular text using COPY.
My problem is that they are treated as regular text and remain in the string unchanged.
How can I get the hex values converted?

Here is a simplified example of my situation:

-- The table I am importing to
CREATE TABLE my_pg_table (
    id serial NOT NULL,
    value text
);

COPY my_pg_table(id, data)
FROM 'location/data.file'
WITH CSV
DELIMITER '    ' -- this is actually a tab
QUOTE ''''
ENCODING 'UTF-8'

Example file:

1    'some data'  
2    'some more data \x0d'  
3    'even more data \x0d\x0a'

Note: the file is tab delimited.

Now, doing:

SELECT * FROM my_pg_table

would get me results containing hex.

Additional info for context:

My task is to export data from sybase tables (many hundreds) and import to Postgres. I am using UNLOAD to export data to files like so:

UNLOAD
TABLE my_sybase_table
TO 'location/data.file'
DELIMITED BY '    ' -- this is actually a tab
BYTE ORDER MARK OFF
ENCODING 'UTF-8'
Harry
  • 181
  • 5
  • 13

1 Answers1

1

It seems to me that (for a reason I don't understand) hex is only converted when using FORMAT TEXT and FORMAT CSV will treat it as regular string.

Solving the problem in my situation:
Because I had to use TEXT I didn't have the QUOTE option anymore and because of that I couldn't have quoted strings in my files anymore. So I needed my files in a little different format and eventually used this to export my table from sybase:

UNLOAD
    SELECT
    COALESCE(cast(id as long varchar), '(NULL)'),
    COALESCE(cast(data as long varchar), '(NULL)')
    FROM my_sybase_table
TO 'location/data.file'
DELIMITED BY '    ' -- still tab delimited
BYTE ORDER MARK OFF
QUOTES OFF
ENCODING 'UTF-8'

and to import it to postgres:

COPY my_pg_table(id, data)
FROM 'location/data.file'
DELIMITER '    ' -- tab delimited
NULL '(NULL)'
ENCODING 'UTF-8'

I used (NULL), because I needed a way to differentiate between an empty string and null. I casted every column to long varchar, to make my mass export/import more convenient.

I'd be still very interested to know why hex wouldn't convert when using FORMAT CSV.

Harry
  • 181
  • 5
  • 13
  • 1
    The answer to the question in your last paragraph: https://www.postgresql.org/docs/current/static/sql-copy.html section File Formats: "backslash is not a special character in the CSV format" – azalea Feb 16 '17 at 17:41