0

I have a python code that executes a query on a database using sqlite3, fetches the rows returned by the query and writes them to a csv file. The number of rows returned by the query is approximately 66 million. When I use python 3.4.3 on windows 7 (64 bit, 4gb RAM, 2.60GHz) the script takes 201 seconds to finish.

But when I execute the same script on my virtual machine (RHEL5_64, 8gb, 3.30 GHz) running python2.7 it takes approximately 1.5 days to finish.

The code snippets are as follows :

import sqlite3,time,csv
with open('sqlite3dump.csv','w',newline='\n') as fp:
   writer = csv.writer(fp, delimiter=',')
   start_time = time.clock()
   conn = sqlite3.connect('detail-config-21-7-2015.sqlite')
   c = conn.cursor()
   c.execute("<some select query>")
   print("Reached this point : 1")
   while c.fetchmany(size=20000):
      writer.writerows(c)
end_time = time.clock();
print("-- Total Time taken : ",(end_time-start_time));
user2400394
  • 91
  • 1
  • 6
  • Does the VM use anythink unusual like a network disk? Show the [EXPLAIN QUERY PLAN](http://www.sqlite.org/eqp.html) output in both cases. – CL. Jul 22 '15 at 08:54
  • sqlite3 in python3.4 uses COVERING INDEX and B-Tree in its query plan which is missing in python2.7. This seems to be making a lot of difference in execution time. – user2400394 Jul 22 '15 at 10:52
  • The old Python has an old SQLite version. – CL. Jul 22 '15 at 11:48

0 Answers0