1

For Example:

books = [{'name':'pearson', 'price':60, 'author':'Jesse Pinkman'},{'name':'ah publications', 'price':80, 'author':'Gus Fring'},{'name':'euclidean', 'price':120, 'author':'Skyler White'},{'name':'Nanjial', 'price':260, 'author':'Saul Goodman'}]

I need to insert each dictionary into already created table by just taking 'author','price' I have like 100k records to be inserted into table. Right now what I am doing is to loop through the list of dictionaries and take the required key/value pair and insert one by one

def insert_books(self, val):
    cur = self.con.cursor()
    sql = """insert into testtable values {}""".format(val)
    cur.execute(sql)
    self.con.commit()
    cur.close()

for i in books:
    result = i['author'],i['price']
    db_g.insert_books(result)   #db_g is class - connection properties

So is there a faster and easier way to bulk insert the data like 10k at a time?

Walter White
  • 39
  • 1
  • 10
  • What's your database? Such performance optimizations tend to be rather database-specific. – Florian Weimer Jul 20 '18 at 15:57
  • Possible duplicate of [Insert multiple rows into DB with Python list of Tuples](https://stackoverflow.com/questions/37058984/insert-multiple-rows-into-db-with-python-list-of-tuples) – MoxieBall Jul 20 '18 at 15:58
  • DB: Postgresql. – Walter White Jul 20 '18 at 15:59
  • Some rationale for that dup target -- you're 1 step away from having a list of tuples, and it will be fewer transactions to do them all at the same time. You should be able to deduce how to do it in one transaction using the info in that post – MoxieBall Jul 20 '18 at 16:00
  • That worked perfectly. Thanks @MoxieBall – Walter White Jul 20 '18 at 16:25

1 Answers1

0

i think bulk insert by dumping the whole dataframe will be much faster..Why Bulk Import is faster than bunch of INSERTs?

import sqlalchemy

def db_conn():
    connection = sqlalchemy.create_engine(//connection string)
    return connection 


books = [{'name':'pearson', 'price':60, 'author':'Jesse Pinkman'},{'name':'ah publications', 'price':80, 'author':'Gus Fring'},{'name':'euclidean', 'price':120, 'author':'Skyler White'},{'name':'Nanjial', 'price':260, 'author':'Saul Goodman'}]

df_to_ingest = pd.DataFrame(books)
df_to_ingest = df_to_ingest([['author', 'price']])

df_to_ingest('tablename', db_conn(), if_exists='append', index=False)

Hope this helps

iamklaus
  • 3,720
  • 2
  • 12
  • 21
  • Well you could explain this more clearly. Bulk insert is preferable because it reduces the number of queries. I do not think creating a dataframe from dictionary would be helpful. Check this out. http://docs.sqlalchemy.org/en/latest/faq/performance.html – mad_ Jul 20 '18 at 18:26
  • @mad_ hey thanks for pointing that out!! just went through the pd.to_sql, i think there are multiple queries which are still getting executed, but i think this approach is much simpler.. :) – iamklaus Jul 23 '18 at 11:22