How i can insert data from dataframe(in python) to greenplum table?

Question

Problem Statement:

I have multiple csv files. I am cleaning them using python and inserting them to SQL server using bcp. Now I want to insert that into Greenplum instead of SQL Server. Please suggest a way to bulk insert into greenplum table directly from python data-frame to GreenPlum table.

Solution: (What i can think)

Way i can think is CSV-> Dataframe -> Cleainig -> Dataframe -> CSV -> then Use Gpload for Bulk load. And integrate it in Shell script for automation. Do anyone has a good solution for it.

Issue in loading data directly from dataframe to gp table:

As gpload ask for the file path. Can i pass a varibale or dataframe to that? Is there any way to bulkload into greenplum ?I dont want to create a csv or txt file from dataframe and then load it to greenplum.

What about using PL/Python and doing the cleaning inside Greenplum? — Kyle Dunn, Nov 22 '17 at 15:27

Nick · Accepted Answer · 2017-11-29T05:56:04.037

I would use psycopg2 and the io libraries to do this. io is built-in and you can install psycopg2 using pip (or conda).

Basically, you write your dataframe to a string buffer ("memory file") in the csv format. Then you use psycopg2's copy_from function to bulk load/copy it to your table.

This should get you started:

import io
import pandas
import psycopg2

# Write your dataframe to memory as csv
csv_io = io.StringIO()
dataframe.to_csv(csv_io, sep='\t', header=False, index=False)
csv_io.seek(0)


# Connect to the GreenPlum database.
greenplum = psycopg2.connect(host='host', database='database', user='user', password='password')
gp_cursor = greenplum.cursor()

# Copy the data from the buffer to the table.
gp_cursor.copy_from(csv_io, 'db.table')
greenplum.commit()

# Close the GreenPlum cursor and connection.
gp_cursor.close()
greenplum.close()

you do not preserve data type when you write to StringIO() so you loose all of it and insert as text instead. — naivepredictor, Jan 16 '19 at 10:44
The table it writes to will have the columns and data types defined, so that is not an issue. — Nick, Jan 16 '19 at 15:10

How i can insert data from dataframe(in python) to greenplum table?

1 Answers1