I'm trying to migrate data from a MySQL DB to HANA utilizing Python. The way we're currently implementing this migration at work is manually but the plan is to run a script everyday to collect data from the prior day (stored in MySQL) and move it to HANA to use their analytics tools. I have written a script with 2 functions, one that connects to MySQL and temporarily stores the data from the query in a Pandas Dataframe. The second function uses the sqlalchemy-hana connector to create an engine that I feed into Pandas' to_sql function to store the data into HANA.
Below is the first function call to MySQL
def connect_to_mysql(query):
try:
#connect to the db
stagedb = myscon.connect(
user = 'user-name',
password = 'password',
host = 'awshost.com',
database = 'sampletable',
raise_on_warnings = True)
df = pandas.read_sql(query, stagedb)
except myscon.Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print('Incorrect user name or password')
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exit")
else:
print(err)
finally:
if central_stagedb:
central_stagecur.close()
central_stagedb.close()
return df
This is the second function call to connect to HANA
def connect_to_hana(query):
#connect to HANA db
try:
engine = create_engine('hana://username:password@host:port')
#return dataframe from first function
to_df = connect_to_mysql(query)
to_df.to_sql('sample_data', engine, if_exists = 'append', index = False, chunksize=20000)
except: raise
My HAHA DB has several schemas in the catalog folder, many of them "SYS" or "_SYS" related. I have created a separate schema to test my code on and play around in, which has the same name as my username.
My questions are as such: 1) is there a more efficient way to load data from MySQL to Hana without using a go-between like a CSV file or, in my case, a Pandas Dataframe. Using VS Code it takes around 90 seconds for the script to complete and 2) when using the sqlalchemy-hana connector, how does it know which schema to create the table and store the data/append the data to? The read-me file didn't really explain. Luckily it's storing it in the right schema (the one with my username) but I created another one as a test and of course the table didn't show up under that one. If I try to specify the database in the create_engine line as so:
engine = create_engine('hana://username:password@host:port/Username')
I get this error: TypeError: connect() got an unexpected keyword argument 'database'.
Also, I noticed that say if I were to run my script twice and count the number of rows in the created table, it adds the rows twice - essentially creating duplicates. Because of this, 3) would it be better to iterate throw the rows of the Dataframe and insert the rows one by one using the pyhdb package?
Any advice/suggestions/answers will be very much appreciated! Thank you!