How to avoid byte quoted column headers when reading redshift data with pg8000

Question

I'm loading some redshift data in pandas via pg8000:

>>>> df = pd.read_sql(sql_text, pg8000_cnx)

pg8000_cnx is a connection object to a redshift database. I always get columns that are byte quoted, ex:

>>> df.columns
Index([b'column_',
       b'column_2',
       ...
       b'column_n'],
       dtype='object', length=142)

I know I can use .decode('utf-8') and some list comprehension to convert the columns back to string, but I was trying to have them loaded in string format from redshift itself. Tried playing with different CLIENT_ENCODING settings with the pg8000 cursor, but still get byte quoted columns in the pandas df.

import pg8000
conn = pg8000.connect(**db_details)
cursor = conn.cursor()
cursor.execute("SET CLIENT_ENCODING TO 'UTF8'")
cursor.execute(sql)
cursor.fetchone()
# load df here

Is there some redshift or pg8000 option I am missing that would load the column headers in utf-8 decoded strings from the db connection object itself?

why are you using pg8000? can you test same using a more defacto library such as Psycopg2? — Jon Scott, May 14 '19 at 08:48
`pg8000` is a perfectly valid choice and is useful in scenarios where you want to reduce the number of dependencies. — Joe Harris, May 14 '19 at 12:59

score 1 · Accepted Answer · answered Oct 31 '20 at 08:48

1

This is fixed in pg8000 1.16.6, so upgrading to the latest version should solve the problem.

So from verion 1.16.6 pg8000 returns column names as str rather than bytes.

answered Oct 31 '20 at 08:48

Tony Locke

454
3
9

How to avoid byte quoted column headers when reading redshift data with pg8000

1 Answers1