7

We are exporting data from Postgres 9.3 into a text file for ingestion by Spark.

We would like to use the ASCII 31 field separator character as a delimiter instead of \t so that we don't have to worry about escaping issues.

We can do so in a shell script like this:

#!/bin/bash
DELIMITER=$'\x1F'
echo "copy ( select * from table limit 1) to STDOUT WITH DELIMITER '${DELIMITER}'" | (psql ...) > /tmp/ascii31

But we're wondering, is it possible to specify a non-printable glyph as a delimiter in "pure" postgres?

edit: we attempted to use the postgres escaping convention per http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html

warehouse=> copy ( select * from table limit 1) to STDOUT WITH DELIMITER '\x1f';

and received

ERROR:  COPY delimiter must be a single one-byte character
jaegard
  • 171
  • 1
  • 7

3 Answers3

15

Try prepending E before the sequence you're trying to use as a delimter. For example E'\x1f' instead of '\x1f'. Without the E PostgreSQL will read '\x1f' as four separate characters and not a hexadecimal escape sequence, hence the error message.

See the PostgreSQL manual on "String Constants with C-style Escapes" for more information.

Gregory Arenius
  • 2,904
  • 5
  • 26
  • 47
  • Brilliant answer. So important to always, always, always prepend E before any sort of hex, regex or special character character within Postgres scripts. Sometimes it will work without but there will always be that one time when it doesn't for no apparent reason. – pele88 Dec 09 '16 at 23:02
2

From my testing, both of the following work:

echo "copy (select 1 a, 2 b) to stdout with delimiter u&'\\001f'"| psql;

echo "copy (select 1 a, 2 b) to stdout with delimiter e'\\x1f'"| psql;
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • 1
    Hmmm, neither of those work in the psql shell, both produce "ERROR: COPY delimiter must be a single one-byte character" – jaegard Feb 17 '15 at 20:24
  • Hmm, strange. What postgres version are you running, and what OS? (I get `psql (PostgreSQL) 9.3.5` from `psql --version`, running on Windows 8.1) – bgoldst Feb 18 '15 at 01:46
  • 1
    Actually, when you say "psql shell", do you mean you've copied the double-quoted text in the above commands and pasted it at the `psql` prompt? That won't work; the double backslash is an escape code in a shell double-quoted string, that will resolve to a single backslash before being echoed to `psql`. There needs to be just a single backslash in the code that `psql` sees. Let me know if this helps. – bgoldst Feb 18 '15 at 13:22
  • Did anyone resolve this? I'm looking to do the same thing. I get `ERROR: invalid byte sequence for encoding "UTF8": 0x00` when I do all kinds of combinations of escaping. – squarism Jun 24 '15 at 18:25
0

I've extracted a small file from Actian Matrix (a fork of Amazon Redshift, both derivatives of postgres), using this notation for ASCII character code 30, "Record Separator".

unload ('SELECT btrim(class_cd) as class_cd, btrim(class_desc) as class_desc
FROM transport.stg.us_fmcsa_carrier_classes')
to '/tmp/us_fmcsa_carrier_classes_mk4.txt'
delimiter as '\036' leader;

This is an example of how this file looks in VI:

C^^Private Property
D^^Private Passenger Business
E^^Private Passenger Non-Business

I then moved this file over to a machine hosting PostgreSQL 9.5 via sftp, and used the following copy command, which seems to work well:

copy fmcsa.carrier_classes
from '/tmp/us_fmcsa_carrier_classes_mk4.txt'
delimiter u&'\001E'; 

Each derivative of postgres, and postgres itself seems to prefer a slightly different notation. Too bad we don't have a single standard!