1

I have a Python dataframe with NULL value in some rows, while inserting to postgresql, some null in datetype column turns into 'NaT' string or 'NaN', I like it to be a real NULL , which is nothing in that cell.

sample dataframe before insert

enter image description here

import psycopg2
import pandas as pd
import numpy as np

conn=psycopg2.connect(dbname= 'myDB', host='amazonaws.com', 
port= '2222', user= 'mysuser', password= 'mypass')
cur = conn.cursor()

df= pd.DataFrame({ 'zipcode':[1,np.nan,22,88],'city':['A','h','B',np.nan]})


subset = df[['zipcode', 'city']]
data = [tuple(x) for x in subset.values]
records_list_template = ','.join(['%s'] * len(data)) 
insert_query = 'insert into public.MyTable (zipcode, city) values {}'.format(records_list_template)
cur.execute(insert_query, data)
conn.commit()

result in postgresql table

enter image description here

expected result below

enter image description here

PyBoss
  • 599
  • 1
  • 7
  • 20

2 Answers2

3

You can convert NaN to None in this way:

df= pd.DataFrame({
    'zipcode':[1,np.nan,22,88],
    'city':['A','h','B',np.nan],
    'date':['2019-01-01','2019-01-02',pd.NaT,pd.NaT]})

df['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else None for d in df['date']]

subset = df.where((pd.notnull(df)), None)

See DataFrame.where

klin
  • 112,967
  • 15
  • 204
  • 232
  • my bad, my actual table contains a datetype column which has some NaT, the df.where((pd.notnull(df)), None) can't convert NaT – PyBoss Jan 29 '19 at 22:49
  • Indeed, it does not work with `NaT` (I thought it should). I added a conversion from `NaT` to None in a specific column, see the updated answer. – klin Jan 30 '19 at 00:06
  • by the way, do you know how to speed up the process if I have 1 millions rows to insert? Bulk insert? – PyBoss Jan 30 '19 at 01:37
  • Do you have the data in a file? – klin Jan 30 '19 at 01:47
  • it's in pandas dataframe, I can save it as a csv. – PyBoss Jan 30 '19 at 17:19
  • Two options. 1. Save data to csv and use [`copy_from()`](http://initd.org/psycopg/docs/cursor.html#cursor.copy_from). 2. Use [`execute_values()`](http://initd.org/psycopg/docs/extras.html#psycopg2.extras.execute_values) - much more efficient than inserting row by row. – klin Jan 30 '19 at 17:26
  • copy_from() only work in PostgreSQL, unfortunate, I'm using AWS Redshift. execute_values() will work in Redshift, but very slow, i just tested it. Thank you for your help. – PyBoss Feb 01 '19 at 23:57
2

Convert all instances of NaN in the dataframe by replacing with None, like this:

df = df.replace({pd.np.nan: None})
Ben Sharkey
  • 313
  • 3
  • 15
  • if you do this, than there will be 'None' in the postgres db. the OP wants to have [null] in the db – bucky Jan 22 '20 at 14:23