0

I have a table with 300,000 rows not under my control that I need to import.

I export the data from mysql using:

mysqldump -u root --password=secret --fields-enclosed-by='\"' -T/tmp apflora_beob

Then I try to import each table using for example:

\copy beob_evab FROM '/Users/alex/tmp/beob_evab.txt' (FORMAT 'csv', DELIMITER E'\t', NULL '\N', ESCAPE E'\"', ENCODING 'UTF8');

This works for most tables, also for one containing 450,000 rows. But on one I get this error:

ERROR:  value too long for type character varying(10)
CONTEXT:  COPY beob_evab, line 190310, column COUV_MOUSSES: "2\%  \N  \N  \N  \N  \N  \N  \N  \N  \N  \N  30  \N  15  \N  \N  \N  \N  \N  \N  \N  \N  \N  \N  \N  \N  0.01  \N  \N
Bachs, Dau..."

When I check field COUV_MOUSSES in line 190'310 it contains this value: 2"%.

What would I have to change to get this to work?

In order to achieve the correct output "2""%" according to @klin (thanks!) I would have to do add --fields-escaped-by='"' to the mysqldump command. So that becomes:

mysqldump -u root --password=secret --fields-enclosed-by='"' --fields-escaped-by='"' -T/tmp apflora_beob

Problem is: now Null values are exported as "N.

Well, no problem, I thought and changed the copy command to:

\copy beob_evab FROM '/Users/alex/tmp/beob_evab.txt' (FORMAT 'csv', DELIMITER E'\t', NULL '"N', ESCAPE "'", ENCODING 'UTF8');

This produces the following error in Postgres:

ERROR:  CSV quote character must not appear in the NULL specification

And it seems that the representation of Null values can not be changed in MySQL.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Alex
  • 2,117
  • 5
  • 28
  • 36
  • Shouldn't be `mysqldump -u root --password=secret --fields-enclosed-by='"' -T/tmp apflora_beob`? – klin Feb 07 '16 at 22:02
  • yeah `--fields-enclosed-by='"'` works too. Seems that it works the same as `--fields-enclosed-by='\"'`. But does not prevent MySQL from exporting `"N` as null values – Alex Feb 07 '16 at 22:14
  • I mean: `mysqldump -u root --password=secret --fields-enclosed-by='"' --fields-escaped-by='\\' -T/tmp apflora_beob` – klin Feb 07 '16 at 22:18
  • according to https://dev.mysql.com/doc/refman/5.5/en/load-data.html that is standard behavior, same as not using `--fields-escaped-by='\\'. And the same error happens as mentioned in the original question – Alex Feb 07 '16 at 22:34
  • I've tried this mysqldump and it seems really stubborn. Maybe you should do some post processing with `sed` (`"N\t -> \\N\t` or something similar). – klin Feb 07 '16 at 23:00

1 Answers1

1

The value should be written down as

"2""%"

Per RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files:

[5]. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.(...)

[7]. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

   "aaa","b""bb","ccc"
Community
  • 1
  • 1
klin
  • 112,967
  • 15
  • 204
  • 232