Parallelising a select query in Python's sqlite does not seem to improve performance

Question

After reading this post, I have been trying to compare parallelization with non parallelization in sqlite. I am using Python's sqlite3 library to create a database containing a table called randNums which contains a two columns, an id and a val. The val is a random number between 0 and 1. I then select all rows with val greater than a half. I have done this twice so as to compare run times of the parallelized version and unparallelized version, however they take the same amount of time. I'd like to know if I am using the keyword 'PARALLEL' incorrectly, or if I need to first enable parallelization with a python command.

Finally, I'd also like to know how parallelization differs for different databases, for example, mysql and postgresql.

import sqlite3
from time import time

con = sqlite3.connect('mydatabase.db')
cursorObj = con.cursor()

cmd  = ['PRAGMA table_info(randNums);']
cmd += ['SELECT count (*) from randNums where val>.5;']
cmd += ['SELECT count /*+ PARALLEL(4) */ (*) from randNums where val>.5;']

for c in cmd:
    t = time()
    cursorObj.execute(c)
    print('command: %s' % c)
    print(cursorObj.fetchall())
    print('run time in seconds: %.3f\n\n' % (time()-t))

Running a Python script containing the above code results in the following output:

command: PRAGMA table_info(randNums);
[(0, 'id', 'INTEGER', 0, None, 1), (1, 'val', 'REAL', 0, None, 0)]
run time in seconds: 0.000


command: SELECT count (*) from randNums where val>.5;
[(49996009,)]
run time in seconds: 3.604


command: SELECT count /*+ PARALLEL(4) */ (*) from randNums where val>.5;
[(49996009,)]
run time in seconds: 3.598

I first generated the database with the following code:

import sqlite3
from random import uniform as rand

con = sqlite3.connect('mydatabase.db')
cursorObj = con.cursor()
cursorObj.execute("CREATE TABLE IF NOT EXISTS randNums(id integer PRIMARY KEY, val real)")
try:
    for i in range(10**8):
        if i%10**5==0: print(i)
        cursorObj.execute("INSERT INTO randNums VALUES(%d, '%f')" % (i,rand(0,1)))
except:
    print('datbase is already filled with data')
    pass
con.commit()

The post you are referring to is about Oracle, I'm not sure if it has an effect in SQLite. — mkrieger1, Feb 21 '23 at 23:07
Good to know, I'd like to know for various SQL variants how one parallelizes a query. In particular I am interested in doing so in python with mysql or sqlite, but I'd also like to do this in other settings such as in an Oracle database. Thanks — Mathew, Feb 22 '23 at 00:52
AFAIK, SQLite is very basic and does not support such feature (it actually does not support many feature). It is pretty inefficient for most operations. If performance matters and you are working on a large database, then SQLite is certainly not the right tool. — Jérôme Richard, Feb 25 '23 at 15:50

score 3 · Answer 1 · answered Feb 25 '23 at 20:28

SQLite does not support parallelization out of the box, and the PARALLEL keyword that you're using in your code has no effect on the execution of your query. The reason your parallel and non-parallel queries take the same amount of time is likely due to the fact that SQLite is not designed for parallel execution and is limited to running queries on a single thread.

To achieve parallelization with SQLite, you can explore the possibility of using external libraries or tools, such as the SQLite Virtual Table extension for parallel queries or utilizing multi-threading in Python to run multiple queries in parallel.

score 0 · Answer 2 · answered Mar 03 '23 at 00:50

0

I think you should try PostgreSQL for better parallel processing. SQLite is a in memory DB which is not designed for high load operations.

answered Mar 03 '23 at 00:50

auvipy

769
10
22

Parallelising a select query in Python's sqlite does not seem to improve performance

2 Answers2