0

I'm trying to migrate data from a MySQL DB to HANA utilizing Python. The way we're currently implementing this migration at work is manually but the plan is to run a script everyday to collect data from the prior day (stored in MySQL) and move it to HANA to use their analytics tools. I have written a script with 2 functions, one that connects to MySQL and temporarily stores the data from the query in a Pandas Dataframe. The second function uses the sqlalchemy-hana connector to create an engine that I feed into Pandas' to_sql function to store the data into HANA.

Below is the first function call to MySQL

def connect_to_mysql(query):
    try:
        #connect to the db
        stagedb = myscon.connect(
            user = 'user-name',
            password = 'password',
            host = 'awshost.com',
            database = 'sampletable',
            raise_on_warnings = True)

        df = pandas.read_sql(query, stagedb)

    except myscon.Error as err:
    if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
        print('Incorrect user name or password')
    elif err.errno == errorcode.ER_BAD_DB_ERROR:
        print("Database does not exit")
    else:
        print(err)
    finally:
        if central_stagedb:
            central_stagecur.close()
            central_stagedb.close()
            return df

This is the second function call to connect to HANA

def connect_to_hana(query):
    #connect to HANA db
    try:
        engine = create_engine('hana://username:password@host:port')
        #return dataframe from first function
        to_df = connect_to_mysql(query)
        to_df.to_sql('sample_data', engine, if_exists = 'append', index = False, chunksize=20000)
    except: raise

My HAHA DB has several schemas in the catalog folder, many of them "SYS" or "_SYS" related. I have created a separate schema to test my code on and play around in, which has the same name as my username.

My questions are as such: 1) is there a more efficient way to load data from MySQL to Hana without using a go-between like a CSV file or, in my case, a Pandas Dataframe. Using VS Code it takes around 90 seconds for the script to complete and 2) when using the sqlalchemy-hana connector, how does it know which schema to create the table and store the data/append the data to? The read-me file didn't really explain. Luckily it's storing it in the right schema (the one with my username) but I created another one as a test and of course the table didn't show up under that one. If I try to specify the database in the create_engine line as so:

engine = create_engine('hana://username:password@host:port/Username')

I get this error: TypeError: connect() got an unexpected keyword argument 'database'.

Also, I noticed that say if I were to run my script twice and count the number of rows in the created table, it adds the rows twice - essentially creating duplicates. Because of this, 3) would it be better to iterate throw the rows of the Dataframe and insert the rows one by one using the pyhdb package?

Any advice/suggestions/answers will be very much appreciated! Thank you!

Nirav
  • 53
  • 1
  • 1
  • 6

1 Answers1

0

Gee... that seems like a rather complicated workflow. Alternatively, you may want to check the HANA features Smart Data Access (SDA) and Smart Data Integration (SDI). With these, you could either establish a "virtual" data access in SAP HANA, that is, you read data from the MySQL DB into the HANA process when you run your analytics query. Or you could actually load the data into HANA, making it a data mart.

If it is really just about the "piping" for this data transfer, I probably wouldn't put 3rd party tools into the scenario. This only makes the setup more complicated than necessary.

Lars Br.
  • 9,949
  • 2
  • 15
  • 29
  • Thank you for your response. Doesn't SDA not support interaction with MySQL? Or at least that is what I've been informed from my manager, hence the need for this python workflow – Nirav Jun 22 '17 at 20:46
  • 1
    Please review the documentation (Administration Guide - Data Provisioning). SDA supported a generic ODBC access and the option to define custom adapters. – Lars Br. Jun 23 '17 at 02:36