3

I am using the copy_expert method in psycopg2 to copy data from a CSV file to a postgresql table. I have postgres table like this:

create table my_table (
  cola          text,
  colb          text,
  colc          text,
  cold          text,
  cole          text,
  colf          text,
  colg          text
)

And a CSV containing data like for the first five columns like this:

cola,colb,colc,cold,cole
1,foo,a,10,vvv
2,bar,b,20,www
3,baz,c,30,xxx
4,boo,d,40,yyy
5,baa,e,50,zzz

I would like to copy the CSV data for the first five columns, while also specifying values for colf and colg (each row should have the same value for colf and colg).

I can copy the first five columns to my table like this:

conn = psycopg2.connect('dbname=name user=username')
cur = conn.cursor()
copy_sql = """
  copy my_table (cola, colb, colc, cold, cole)
  from stdin with
    csv
    header
    delimiter as ','
"""
from_csv = '/path/to/data.csv'
with open(from_csv, 'r') as f:
  cur.copy_expert(sql=copy_sql, file=f)
  conn.commit()
  cur.close()

How can I also specify values for the last two columns using python? I know that I can specify default values in the table DDL, like this:

create table my_table (
  cola          text,
  colb          text,
  colc          text,
  cold          text,
  cole          text,
  colf          text default 'foo',
  colg          text default 'bar'
)

But I'd like to add the values using python, since each CSV upload will have its own values for colf and colg, and these values are determined by logic in my python code.

Steve
  • 2,401
  • 3
  • 24
  • 28

1 Answers1

3

Looks like there are a couple of ways to do this by first adding the columns I need to the data, and then uploading the updated data.

Using the petl package:

import psycopg2
from petl import fromcsv, addfield, todb

csv_file = '/path/to/data.csv'
table = fromcsv(csv_file)
table = addfield(table, 'colf', 'Some value')
table = addfield(table, 'colg', 'Another value')

conn = psycopg2.connect('dbname=test user=user')
todb(table, conn, 'my_table')

This works okay on small data, but it's incredibly slow on large data. The psycopg2 copy_from and copy_expert commands seem to go much faster since they make use of postgresql bulk copy. I was able to copy my data using copy_from by first converting my csv file into a pandas dataframe:

import psycopg2
import pandas as pd
from io import StringIO

csv_file = '/path/to/file'
df = pd.read_csv(csv_file)
df['colf'] = 'My value'
df['colg'] = 'Foobar'

In order to make use of the psycopg2 copy_ commands, I need to convert the dataframe into a file-like object with read() and readline() methods, which I can do using StringIO:

buf = StringIO()
df.to_csv(buf, header=False, index=False)
buf.pos = 0

Note that you need to set the pos of the buffer to 0, because pandas.to_csv seems to set the pos to the end by default. See this SO answer for an explanation.

Then I can copy that buffer object:

conn = psycopg2.connect('dbname=test user=user')
cur = conn.cursor()
cur.copy_from(buf, 'my_table', sep=',')
conn.commit()
cur.close()
AlSher
  • 127
  • 9
Steve
  • 2,401
  • 3
  • 24
  • 28
  • will this work if you have non-string types in your postgres table? I tried your code but the table remains empty after my attempt at bulk upload, so I thought maybe it's because of a type mismatch. – Boris Jan 20 '21 at 11:51