I am trying to load a csv file from s3 to redshift table using python. I have used boto3 to pull data from s3. Used pandas to convert data types (timestamp, string and integer) and tried to upload the dataframe to table using to_sql (sqlalchemy). It ended up with error
cursor.executemany(statement, parameters) psycopg2.errors.StringDataRightTruncation: value too long for type character varying(256)".
Additional Info: string contains large amount of mixed data. Also I am able to take the output as csv in my local machine.
My code as follows,
import io
import boto3
import pandas as pd
from sqlalchemy import create_engine
from datetime import datetime
client = boto3.client('s3', aws_access_key_id="",
aws_secret_access_key="")
response = client.get_object(Bucket='', Key='*.csv')
file = response['Body'].read()
df = pd.read_csv(io.BytesIO(file))
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
df['text'] = df['text'].astype(str)
df['count'] = df['count'].fillna(0).astype(int)
con = create_engine('postgresql://*.redshift.amazonaws.com:5439/dev')
select_list = ['date','text','count']
write = df[select_list]
df = pd.DataFrame(write)
df.to_sql('test', con, schema='parent', index=False, if_exists='replace')
I am a beginner, please help me to understand what I am doing wrong. Ignore any typo errors. Thanks.