4

I need to load data from some source data sources to a Postgres database. To do this task, I first write the data to a temporary CSV file and then load data from the CSV file to Postgres database using COPY FROM query. I do all of this on Python.

The code looks like this:

table_name = 'products'
temp_file = "'C:\\Users\\username\\tempfile.csv'"
db_conn = psycopg2.connect(host, port, user, password, database)
cursor = db_conn.cursor()
query = """COPY """ + table_name + """ FROM """ + temp_file + " WITH NULL AS ''; """
cursor.execute(query)

I want to avoid the step of writing to the intermediate file. Instead, I would like to write to a Python object and then load data to postgres database using COPY FROM file method.

I am aware of this technique of using psycopg2's copy_from method which copies data from a StringIO object to the postgres database. However, I cannot use psycopg2 for a reason and hence, I don't want my COPY FROM task to be dependent on a library. I want it to be Postgres query which can be run by any other postgres driver as well.

Please advise a better way of doing this without writing to an intermediate file.

Community
  • 1
  • 1
user3422637
  • 3,967
  • 17
  • 49
  • 72

2 Answers2

7

You could call the psql command-line tool from your script (i.e. using subprocess.call) and leverage its \copy command, piping the output of one instance to the input of another, avoiding a temp file. i.e.

psql -X -h from_host -U user -c "\copy from_table to stdout" | psql -X -h to_host -U user -c "\copy to_table from stdin"

This assumes the table exists in the destination database. If not, a separate command would first need to create it.

Also, note that one caveat of this method is that errors from the first psql call can get swallowed by the piping process.

khampson
  • 14,700
  • 4
  • 41
  • 43
  • My system requires db password to execute sql,how can i give it in copy command – kaviya .P Dec 11 '19 at 05:02
  • @kaviya.P: The simplest way to accomplish that, especially when using psql in a piped form like above, would be to setup a pgpass file on the box so you don't get prompted for a password. See more info here: https://www.postgresql.org/docs/current/libpq-pgpass.html – khampson Dec 11 '19 at 22:46
7

psycopg2 has integrated support for the COPY wire-protocol, allowing you to use COPY ... FROM STDIN / COPY ... TO STDOUT.

See Using COPY TO and COPY FROM in the psycopg2 docs.

Since you say you can't use psycopg2, you're out of luck. Drivers must understand COPY TO STDOUT / COPY FROM STDIN in order to use them, or must provide a way to write raw data to the socket so you can hijack the driver's network socket and implement the COPY protocol yourself. Driver specific code is absolutely required for this, it is not possible to simply use the DB-API.

So khampson's suggestion, while usually a really bad idea, seems to be your only alternative.

(I'm posting this mostly to make sure that other people who find this answer who don't have restrictions against using psycopg2 do the sane thing.)

If you must use psql, please:

  • Use the subprocess module with the Popen constructor
  • Pass -qAtX and -v ON_ERROR_STOP=1 to psql to get sane behaviour for batching.
  • Use the array form command, e.g. ['psql', '-v', 'ON_ERROR_STOP=1', '-qAtX', '-c', '\copy mytable from stdin'], rather than using a shell.
  • Write to psql's stdin, then close it, and wait for psql to finish.
  • Remember to trap exceptions thrown on command failure. Let subprocess capture stderr and wrap it in the exception object.

It's safer, cleaner, and easier to get right than the old-style os.popen2 etc.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 1
    I keep getting this error unfortunately when i try your solution: 'could not find a "psql" to execute\r\npsql: could not create socket: The requested service provider could not be loaded or initialized.\r\r\n (0x0000277A/10106)\r\n ' – Paris Char Mar 23 '20 at 00:41
  • If i try the rendered command string direct in a cmd box however, it works great. – Paris Char Mar 23 '20 at 00:42
  • @ParisChar I have no idea what weird proxy or tool you're using. Post a new question with details. – Craig Ringer Mar 26 '20 at 14:28
  • @ParisChar I had the same error message and for me it was a broken env. If you pass an `env` to `subprocess.run`, this will be used *instead* of the existing env, not in addition to it. So use `os.environ` and extend it with your other variables, then pass that to `subprocess.run`. Maybe this helps you and others who end up here. – bugmenot123 Aug 10 '23 at 09:49