-1

I'm trying to import a large geojson file into a Postgres table. In order to do so, I first convert the json into csv with python:

import pandas as pd
df = pd.read_json('myjson.txt')
df.to_csv('myjson.csv',sep='\t')

The resulting csv looks like:

        name    type    features
0       geobase     FeatureCollection       {'type': 'Feature', 'geometry': {'type': 'LineString', 'coordinates': [[-73.7408048408216, 45.5189595588608], [-73.7408749973688, 45.5189893490944], [-73.7409267622838, 45.5190212771795], [-73.7418867072278, 45.519640108602204], [-73.7419636417947, 45.5196917400376]]}, 'properties': {'ID_TRC': 1010001, 'DEB_GCH': 12320, 'FIN_GCH': 12340}}

The first three lines in json file were:

{"name":"geobase","type":"FeatureCollection"
,"features":[
{"type":"Feature","geometry":{"type":"LineString","coordinates":[[-73.7408048408216,45.5189595588608],[-73.7408749973688,45.5189893490944],[-73.7409267622838,45.5190212771795],[-73.7418867072278,45.5196401086022],[-73.7419636417947,45.5196917400376]]},"properties":{"ID_TRC":1010001,"DEB_GCH":12320,"FIN_GCH":12340}}

Following that, the copy command into my postgres table is:

psql -h (host) -U (user) -d (database)  -c "\COPY geometries.geobase_tmp(id,name,type,features) FROM '.../myjson.csv' with (format csv,header true, delimiter E'\t');"

results in my table filled with name,type and features. First feature (a text field) is for example the following string:

{'type': 'Feature', 'geometry': {'type': 'LineString', 'coordinates': [[-73.7408048408216, 45.5189595588608], [-73.7408749973688, 45.5189893490944], [-73.7409267622838, 45.5190212771795], [-73.7418867072278, 45.519640108602204], [-73.7419636417947, 45.5196917400376]]}, 'properties': {'ID_TRC': 1010001, 'DEB_GCH': 12320, 'FIN_GCH': 12340}}

In Postgres, when I try to read from this tmp table into another one:

SELECT features::json AS fc FROM geometries.geobase_tmp

I get the error :

SQL Error [22P02]: ERROR: invalid input syntax for type json
Detail : Token "'" is invalid.
  Where : JSON data, line 1: {'...

It's like if Postgres expects double quotes and not single quotes to parse the json text. What can I do to avoid this problem?

EDIT: I followed the procedure described here (datatofish.com/json-string-to-csv-python) to convert json to csv. The source (the json txt file) is a valid json and contains only double quotes. After conversion, it's not a valid json anymore (it contains single quotes instead of double quotes). Is there a way to output a csv while keeping the double quotes?

Patrick
  • 2,577
  • 6
  • 30
  • 53
  • 1
    Yes, JSON *requires* double quotes. What you have isn't JSON. – deceze Aug 09 '21 at 18:17
  • See edit for details – Patrick Aug 09 '21 at 18:23
  • 1
    I assume that json-like string that is outputted from your `to_csv` had single quotes in it in the`myjson.txt` file as well, right? I think the real answer here is how to switch from single quotes to double quotes while that data is sitting in your pandas dataframe. Then you will have a proper JSON string outputted in the `to_csv()` call and postgres can then handle it. – JNevill Aug 09 '21 at 18:27
  • No, that's what is puzzling me. The json txt file is a valid json, it contains only double quotes. – Patrick Aug 09 '21 at 18:32
  • A different way is acceptable – Patrick Aug 09 '21 at 18:42

1 Answers1

1

I figured it out:

Json to csv:

import pandas as pd
import json
import csv

df = pd.read_json('myjson.txt')
df['geom'] = df['features'].apply(lambda x:json.dumps(x['geometry']))

df['properties'] = df['features'].apply(lambda x:json.dumps(x['properties']))
df[['geom','properties']].to_csv('myjson.csv',sep='\t',quoting=csv.QUOTE_ALL)

Now CSV file looks like:

""      "geom"  "properties"
"0"     "{""type"": ""LineString"", ""coordinates"": [[-73.7408048408216, 45.5189595588608], [-73.7408749973688, 45.5189893490944], [-73.7409267622838, 45.5190212771795], [-73.7418867072278, 45.519640108602204], [-73.7419636417947, 45.5196917400376]]}"    "{""ID_TRC"": 1010001, ""DEB_GCH"": 12320, ""FIN_GCH"": 12340}"
...

Postgres tmp table created with:

CREATE TABLE geometries.geobase_tmp (
  id int,
  geom TEXT,
  properties TEXT
)

Copy CSV content into tmp table:

psql -h (host) -U (user) -d (database)  -c "\COPY geometries.geobase_tmp(id,geom,properties) FROM 'myjson.csv' with (format csv,header true, delimiter E'\t');"

Creation of final postgres table which contains geometry and properties (each property in its own field):

drop table if exists geometries.troncons;

SELECT
  row_number() OVER () AS gid,
  ST_GeomFromGeoJSON(geom)  as geom,
  properties::json->'ID_TRC' AS ID_TRC,
  properties::json->'DEB_GCH' AS DEB_GCH,
  properties::json->'FIN_GCH' AS FIN_GCH
INTO TABLE geometries.troncons
FROM geometries.geobase_tmp
Patrick
  • 2,577
  • 6
  • 30
  • 53