1

I am currently trying to insert roughly a billion rows of data in a MySQL table. I am pulling my data from a directory of .JSON files where each .JSON file contains ~200K rows. There are 5K files total.

What I am currently doing, is going through each file and creating a tuple which contains the rows I want to insert. I am putting this tuple in a list and after I get through the whole JSON file, I insert the list of rows in MySQL. This is faster than inserting each row into SQL at a time, but this is still going to take me over 3 days and I don't have the time to spare.

I initially created a list that contained 200,000,000 rows each (which was fast to generate) but it took too long to insert in MySQL. That is why I am not only inserting every 200,000 rows. Does anyone have any advice on how to speed this up?

path = *path to my file*
for filename in glob.glob(os.path.join(path, '*.JSON')):
    myList = []
    with open(filename) as json_data:
        j = json.load(json_data)
        for i in j["rows"]:
            name = i["values"][0][0]
            age = i["values"][0][1]
            gender = i["values"][0][2]
            data = (**None**,name,age,gender)
            myList.append(data)
        cursor = conn.cursor()
        q = """INSERT INTO nordic_data values (%s,%s,%s,%s)"""
        cursor.executemany(q, myList)
        conn.commit()
Shabina Rayan
  • 389
  • 1
  • 8
  • 22
  • Compare to [this question](https://stackoverflow.com/q/8474926/223424), read [the docs](https://dev.mysql.com/doc/refman/5.7/en/insert-optimization.html). Disable all indexes and constraints at the time of insert. Try using several threads / processes. – 9000 Aug 03 '17 at 02:32
  • Hint: Executing 1 long INSERT query is faster than executing 1000 short INSERT queries. e.g. `INSERT INTO nordic_data VALUES (1,2,3,4),(2,3,4,5), (3,4,5,6)` – Raptor Aug 03 '17 at 02:33
  • Tried [LOAD DATA INFILE](https://dev.mysql.com/doc/refman/5.7/en/load-data.html) ? – Paul Aug 03 '17 at 02:33
  • What I've decided to do is to convert all the 6200 JSON files into CSV files. Then use LOAD DATA INFILE to put it in MySQL. – Shabina Rayan Aug 03 '17 at 04:27

1 Answers1

1

There's a lot of potential stuff, but a couple ideas that spring to mind:

  1. Wrap every X inserts in to a transaction.
  2. Drop the indexes from the table, insert X rows, then recreate the indexes.

Inserts have to update indexes, so every insert is modifying the index(es) on the table.

kimsal
  • 543
  • 6
  • 10