2

I have a python script that adds many entries to a Postgres table using INSERT. I want to use COPY instead for speed. This answer gets halfway there, but does not instruct how to format lists, booleans, etc.

With INSERT, psycopg2 handles formatting for you:

foo = [0,1,2]
bar = '"Hello," she said'
cur.execute("insert into table (foo, bar) values (%s, %s)", (foo, bar))

However, this doesn't work with copy, as you have to use csv formatted data:

foo = [0,1,2]
bar = '"Hello," she said'
csv_string = "\t".join(str(foo), str(bar))
buf = io.StringIO()
buf.write(csv_string)
buf.seek(0)
cur.copy_from(buf, 'table')
# Does not work, as data is not formatted properly

Formatting with csv writer doesn't work either:

writer = csv.writer(buf)
csv_writer.writerow([foo,bar])
buf.seek(0)
cur.copy_from(buf, 'table')
# Fails on lists which get formatted as [], fails on NULL values

How do I format my data into Postgres-compatible CSV strings? I tried cur.mogrify, but it formats lists as ARRAY[0,1,2] instead of {0,1,2}, and copy_from expects the latter.

I suppose I could try to roll my own string formatter, but surely there is a better way?

Agargara
  • 902
  • 11
  • 24
  • you don't format data for copy as it originally copies csv – Vao Tsun Sep 21 '17 at 08:05
  • Do not use `copy`. Use [`execute_values`](http://initd.org/psycopg/docs/extras.html#psycopg2.extras.execute_values) in instead. – Clodoaldo Neto Sep 21 '17 at 08:09
  • @ClodoaldoNeto: the whole point is that I want to use COPY for speed improvement. Using insert and execute_values is slow. – Agargara Sep 21 '17 at 08:12
  • The whole point of `execute_values` is that it is much faster then `insert`. – Clodoaldo Neto Sep 21 '17 at 08:17
  • If using csv, you have to use `copy_expert()` so that you can pass the required format argument(s). Search a bit, lots of examples. – Ilja Everilä Sep 21 '17 at 08:21
  • @ClodoaldoNeto: `execute_values` still uses `insert`. I suppose you mean that it's faster to insert many values at once with `execute_values` rather than one at a time with `execute`, which is true. However `copy` is still 6.5x faster according to this benchmark: https://gist.github.com/jsheedy/efa9a69926a754bebf0e9078fd085df6 – Agargara Sep 22 '17 at 08:59
  • @IljaEverilä I am aware of `copy_expert.` The problem is how to format the CSV data in the first place. I have done lots of searching but could not find any examples of how to turn python data types into Postgres-compatible CSV. (Formatting arrays, null types, etc. properly.) If you have found any, please post a link! – Agargara Sep 22 '17 at 09:02
  • Missed the real point of your post, sorry. The use of the term Comma Separated Values threw me off, since you explicitly have to tell COPY to parse such (compared to the tab separated default format, the usage of which I also missed in your code). – Ilja Everilä Sep 22 '17 at 09:14

1 Answers1

1

The following example works:

foo = [0,1,2]
bar = '"Hello," she said'
csv_string = str(foo)+"\t"+ str(bar)
print(csv_string)
buf = io.StringIO()
buf.write(csv_string)
buf.seek(0)
cur.copy_from(buf, 'table')

The difference between your code and the code above is line #3 (csv_string=...).

Anyway, I'm recommending using copy_expert instead of copy_from. This is more flexible option.