1

As part of my homework I need to load large data files into two MySQL tables, parsed using Python, on my guest machine that is called via Vagrant SSH.

I also then need to run a Sqoop job on one of the 2 tables so now I'm up to the point where I loaded one of the tables successfully and ran the Python script to load the second table and it's been more than 3 hours and still loading.

I was wondering whether I could complete my Sqoop job on the already loaded table instead of staring at a black screen for almost 4 hours now.

My questions are:

  1. Is there any other way to Vagrant SSH into the same machine without doing Vagrant reload (because --reload eventually shuts down my virtual machine thereby killing all the current jobs running on my guests).

  2. If there is, then given that I open a parallel window to log in to the guest machine as usual and start working on my Sqoop job on the first table that already loaded; will it any way affect my current job with the second table that is still loading? Or will it have a data loss as I can't risk re-doing it as it is super large and extremely time-consuming.

  3. python code goes like this

    ~~ def parser():

    with open('1950-sample.txt', 'r', encoding='latin_1') as input:

        for line in input:
    

    ....

    Inserting into tables

def insert():

if (tablename == '1950_psr'):

    cursor.execute("INSERT INTO 1950_psr (usaf,wban,obs_da_dt,lati,longi,elev,win_dir,qc_wind_dir, sky,qc_sky,visib,qc_visib,air_temp,qc_air_temp,dew_temp,qc_dew_temp,atm_press,qc_atm_press)VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",(USAF,WBAN,obs_da_dt,lati,longi,elev,win_dir,qc_wind_dir, sky,qc_sky,visib,qc_visib,air_temp,qc_air_temp,dew_temp,qc_dew_temp,atm_press,qc_atm_press))


elif (tablename == '1986_psr'):

    cursor.execute("INSERT INTO 1986_psr (usaf,wban,obs_da_dt,lati,longi,elev,win_dir,qc_wind_dir, sky,qc_sky,visib,qc_visib,air_temp,qc_air_temp,dew_temp,qc_dew_temp,atm_press,qc_atm_press)VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",(USAF,WBAN,obs_da_dt,lati,longi,elev,win_dir,qc_wind_dir, sky,qc_sky,visib,qc_visib,air_temp,qc_air_temp,dew_temp,qc_dew_temp,atm_press,qc_atm_press))

parser()

Saving & closing

conn.commit()

conn.close()
  • 1
    *"... and run the Python script to load the second table and it's been more than 3 hours and still loading"* - Please show the relevant code and state the exact problem or error. Also see [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) and [How can mysql insert millions records faster?](https://stackoverflow.com/q/19682414/608639) – jww Mar 23 '19 at 00:39
  • There is no error yet.The screen is stuck at "File Exists, opened database successfully" after i ran "python3 parser.py 1986.txt(filename 1986_psr(table name). The file itself is 2.6G.It sure has more than 600000 records. – Preethi Vasanth Mar 23 '19 at 01:29

1 Answers1

0

I don't know what's in your login scripts, and I'm not clear what that --reload flag is, but in general you can have multiple ssh sessions to the same machine. Just open another terminal and ssh into the VM.

However, in your case, that's probably not a good idea. I suspect that the second table is taking long time to load because your database is reindexing or it's waiting on a lock to be released.

Unless you are loading hundreds of meg's, I suggest you first check for locks and see what queries are pending.

Even if you are loading very large dataset and there are no constraints on the table you need for you script, you are just going to pile up on a machine that's already taxed pretty heavily...

ventsyv
  • 3,316
  • 3
  • 27
  • 49
  • Hi.thanks for your answer. by --reload I meant vagrant reload --provision . I usually do this before ssh to bring up the machine. And the 2nd table is still loading!!!! It is more than megs, should be in gigs.Could you please tell me how to check for locks . TIA! – Preethi Vasanth Mar 23 '19 at 01:18
  • Oh, I see. You only need to do that the first time. Once you environment is up, you can just connect to it. – ventsyv Mar 23 '19 at 17:24