0

I need to import geocoding results for 100,000 addresses to my database. The results are available in a CSV file. My colleague imported it using mySQL. However, I am working with PostgreSQL and importing does not work. I am trying to fill an empty table with 42 columns. The second column contains the raw data from the geocoding API as text. The subsequent columns are separated by commas. Please see an example of one address (should appear as one row in my database later on) in the way it appears in the CSV. To make it clear: 1st column should be filled with 00012 VILLANOVA DI GUIDONIA, IT, 2nd column with raw data, 3rd column with OK, 4th column with 1, 5th column with Lazio, etc..

00012 VILLANOVA DI GUIDONIA, IT,"{\
   \"results\" : [\
      {\
         \"address_components\" : [\
            {\
               \"long_name\" : \"Villanova\",\
               \"short_name\" : \"Villanova\",\
               \"types\" : [ \"locality\", \"political\" ]\
            },\
            {\
               \"long_name\" : \"Guidonia\",\
               \"short_name\" : \"Guidonia\",\
               \"types\" : [ \"administrative_area_level_3\", \"political\" ]\
            },\
            {\
               \"long_name\" : \"Città Metropolitana di Roma\",\
               \"short_name\" : \"RM\",\
               \"types\" : [ \"administrative_area_level_2\", \"political\"      ]\
            },\
            {\
               \"long_name\" : \"Lazio\",\
               \"short_name\" : \"Lazio\",\
               \"types\" : [ \"administrative_area_level_1\", \"political\"     ]\
            },\
            {\
               \"long_name\" : \"Italy\",\
               \"short_name\" : \"IT\",\
               \"types\" : [ \"country\", \"political\" ]\
            },\
            {\
               \"long_name\" : \"00012\",\
               \"short_name\" : \"00012\",\
               \"types\" : [ \"postal_code\" ]\
            }\
         ],\
         \"formatted_address\" : \"00012 Villanova RM, Italy\",\
         \"geometry\" : {\
            \"bounds\" : {\
               \"northeast\" : {\
                  \"lat\" : 41.9732526,\
                  \"lng\" : 12.7654465\
               },\
               \"southwest\" : {\
                  \"lat\" : 41.9560237,\
                  \"lng\" : 12.745568\
                }\
            },\
            \"location\" : {\
               \"lat\" : 41.96298669999999,\
               \"lng\" : 12.7566109\
            },\
            \"location_type\" : \"APPROXIMATE\",\
            \"viewport\" : {\
               \"northeast\" : {\
                  \"lat\" : 41.9732526,\
                  \"lng\" : 12.7654465\
               },\
               \"southwest\" : {\
                  \"lat\" : 41.9560237,\
                  \"lng\" : 12.745568\
               }\
            }\
          },\
          \"place_id\" : \"ChIJQbO9a615LxMRMpoTF6_GZ2I\",\
         \"types\" : [ \"locality\", \"political\" ]\
      }\
   ],\
   \"status\" : \"OK\"\
}\
,OK",1,"Lazio","Città Metropolitana di Roma","Guidonia",\N,\N,\N,"Italy"\N,\N,"Villanova",\N,\N,\N,\N,\N,"00012",\N,\N,\N,\N,\N,\N,\N,\N,\N,\N,\N,\N,\N,\N,\N,41.962986,12.756611,"locality","locality, political",3148330,"00012 Villanova RM, Italy","2017-10-30 17:28:40"

I was using the hint in PostgreSQL csv import from a MySQL csv export?

and tried

\COPY addresses_googleresponse from 'myfile.csv' with delimiter AS ',' NULL AS '\\N' ESCAPE E'\\' CSV

-> error: unterminated csv quoted field

Using the hint in

unterminated CSV quoted field in Postgres, I tried

\COPY addresses_googleresponse from 'myfile.csv' with delimiter AS ',' NULL AS '\\N' ESCAPE E'\\' quote E'\b'  CSV

-> error: missing data for column “numresults”

I see potential problems with the second column in this format, but that's how it looks like. Any ideas how to import it with \copy (and hopefully without preprocessing) are more than welcome (I cannot use copy, only \copy).

Florian Seliger
  • 421
  • 4
  • 16
  • I don't think `\copy` can handle multi-line records. You will need a different tool to import that data. Do you know which tool your colleague used? –  Nov 07 '17 at 07:49
  • I think your problem can be solved by removing the \r\n you have in your CVS file, you need to have one line for a row. – Hervé Piedvache Nov 07 '17 at 07:54

2 Answers2

0

Your CSV file is not correct.

The file starts with

00012 VILLANOVA DI GUIDONIA, IT,"{\

and you want the first field to be 00012 VILLANOVA DI GUIDONIA, IT, right?

But there is no way to tell that the first comma is part of a field, while the second comma separates fields.

You'd have to surround the address with quotes.

Similarly, the ,OK", that you'd like to be the third field is actually the end of the multi-line data field, because it is inside the quotes that delimit that field.

There is no problem with having multi-line data as long as you get the quoting right.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
0

We solved the problem. For those who are intereseted, the following code works:

\COPY addresses_googleresponse from 'mytable.csv' with csv delimiter ',' null '\N' escape '\'
Florian Seliger
  • 421
  • 4
  • 16