I'm loading some redshift data in pandas via pg8000:
>>>> df = pd.read_sql(sql_text, pg8000_cnx)
pg8000_cnx
is a connection object to a redshift database. I always get columns that are byte quoted, ex:
>>> df.columns
Index([b'column_',
b'column_2',
...
b'column_n'],
dtype='object', length=142)
I know I can use .decode('utf-8')
and some list comprehension to convert the columns back to string, but I was trying to have them loaded in string format from redshift itself. Tried playing with different CLIENT_ENCODING
settings with the pg8000 cursor, but still get byte quoted columns in the pandas df.
import pg8000
conn = pg8000.connect(**db_details)
cursor = conn.cursor()
cursor.execute("SET CLIENT_ENCODING TO 'UTF8'")
cursor.execute(sql)
cursor.fetchone()
# load df here
Is there some redshift or pg8000 option I am missing that would load the column headers in utf-8 decoded strings from the db connection object itself?