-1

I am working on a project which is receiving information from GitHub API and analyze them. The problem is when ı try to insert this kind of huge data (for example in files I insert a list of lists as seen below which has 19k item in it) it takes a very long time. How can I improve that? Thanks.

       for i in commits_array:
            for j in i[-1]:
                self.insert_value_to_files_DB(j)


    def insert_value_to_files_DB(self, array):
    try:

        sql = "INSERT INTO files (file_count,file_sha,file_name,file_status,file_additions,file_deletions,file_changes,commit_sha) VALUES (%s, %s, %s, %s,%s, %s, %s,%s)"
        self.cursor.execute(sql, array)
        self.connection.commit()
    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(erro
O. Jones
  • 103,626
  • 17
  • 118
  • 172
tosicann
  • 13
  • 3
  • (1) Consider using `executemany` instead of `execute`, and only a single `commit` (2) Write the data to a csv and use MySQL's `LOAD INFILE` function – snakecharmerb Nov 18 '20 at 21:00

1 Answers1

0

19,000 rows isn't huge for a dbms. It's small.

Before you start inserting rows, do

self.connection.start_transaction()

Then after every hundred rows or so do

self.connection.commit()
self.connection.start_transaction()

Then, when you're done do a final commit().

On inserts, it's the commit operations that take time. Putting multiple rows into one commit makes things faster.

O. Jones
  • 103,626
  • 17
  • 118
  • 172