1

After reading this post, I have been trying to compare parallelization with non parallelization in sqlite. I am using Python's sqlite3 library to create a database containing a table called randNums which contains a two columns, an id and a val. The val is a random number between 0 and 1. I then select all rows with val greater than a half. I have done this twice so as to compare run times of the parallelized version and unparallelized version, however they take the same amount of time. I'd like to know if I am using the keyword 'PARALLEL' incorrectly, or if I need to first enable parallelization with a python command.

Finally, I'd also like to know how parallelization differs for different databases, for example, mysql and postgresql.

import sqlite3
from time import time

con = sqlite3.connect('mydatabase.db')
cursorObj = con.cursor()

cmd  = ['PRAGMA table_info(randNums);']
cmd += ['SELECT count (*) from randNums where val>.5;']
cmd += ['SELECT count /*+ PARALLEL(4) */ (*) from randNums where val>.5;']

for c in cmd:
    t = time()
    cursorObj.execute(c)
    print('command: %s' % c)
    print(cursorObj.fetchall())
    print('run time in seconds: %.3f\n\n' % (time()-t))

Running a Python script containing the above code results in the following output:

command: PRAGMA table_info(randNums);
[(0, 'id', 'INTEGER', 0, None, 1), (1, 'val', 'REAL', 0, None, 0)]
run time in seconds: 0.000


command: SELECT count (*) from randNums where val>.5;
[(49996009,)]
run time in seconds: 3.604


command: SELECT count /*+ PARALLEL(4) */ (*) from randNums where val>.5;
[(49996009,)]
run time in seconds: 3.598

I first generated the database with the following code:

import sqlite3
from random import uniform as rand

con = sqlite3.connect('mydatabase.db')
cursorObj = con.cursor()
cursorObj.execute("CREATE TABLE IF NOT EXISTS randNums(id integer PRIMARY KEY, val real)")
try:
    for i in range(10**8):
        if i%10**5==0: print(i)
        cursorObj.execute("INSERT INTO randNums VALUES(%d, '%f')" % (i,rand(0,1)))
except:
    print('datbase is already filled with data')
    pass
con.commit()
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Mathew
  • 1,116
  • 5
  • 27
  • 59
  • 1
    The post you are referring to is about Oracle, I'm not sure if it has an effect in SQLite. – mkrieger1 Feb 21 '23 at 23:07
  • Good to know, I'd like to know for various SQL variants how one parallelizes a query. In particular I am interested in doing so in python with mysql or sqlite, but I'd also like to do this in other settings such as in an Oracle database. Thanks – Mathew Feb 22 '23 at 00:52
  • AFAIK, SQLite is very basic and does not support such feature (it actually does not support many feature). It is pretty inefficient for most operations. If performance matters and you are working on a large database, then SQLite is certainly not the right tool. – Jérôme Richard Feb 25 '23 at 15:50
  • which version of SQL would you recommend? – Mathew Feb 26 '23 at 22:54

2 Answers2

3

SQLite does not support parallelization out of the box, and the PARALLEL keyword that you're using in your code has no effect on the execution of your query. The reason your parallel and non-parallel queries take the same amount of time is likely due to the fact that SQLite is not designed for parallel execution and is limited to running queries on a single thread.

To achieve parallelization with SQLite, you can explore the possibility of using external libraries or tools, such as the SQLite Virtual Table extension for parallel queries or utilizing multi-threading in Python to run multiple queries in parallel.

gerpaick
  • 801
  • 2
  • 13
  • 36
0

I think you should try PostgreSQL for better parallel processing. SQLite is a in memory DB which is not designed for high load operations.

auvipy
  • 769
  • 10
  • 22