2

I know there are a few ways to retrieve data from RDB table. One with pandas as read_sql, the other with cursore.fetchall(). What are the main differences between both ways in terms of:

  1. memory usage - is df less reccomended?
  2. performance - selecting data from a table (e.g. large set of data)
  3. performace - inserting data with a loop for cursor vs df.to_sql.

Thanks!

General Grievance
  • 4,555
  • 31
  • 31
  • 45
Liorb
  • 21
  • 1
  • 4

1 Answers1

2

That's an interesting question. For a ~10GB SQLite database, I get the following results for your second question. pandas.sql_query seems comparable to speed with the cursor.fetchall.

The rest I leave as an exercise. :D

import sqlite3
import time
import pandas as pd

def db_operation(db_name):
    connection = sqlite3.connect(db_name)
    c = connection.cursor()
    yield c
    connection.commit()
    connection.close()

start = time.perf_counter()
for i in range(0, 10):
    with db_operation('components.db') as c:
            c.execute('''SELECT * FROM hashes WHERE (weight > 50 AND weight < 100)''')
            fetchall = c.fetchall()

time.perf_counter() - start
2.967 # fractional seconds


start = time.perf_counter()
for i in range(0, 10):
    connection = sqlite3.connect('components.db')
    sql_query = pd.read_sql_query('''SELECT * FROM hashes WHERE (weight > 50 AND weight < 100)''', con = connection)
    connection.commit()
    connection.close()

time.perf_counter() - start
2.983 # fractional seconds

The difference is that cursor.fetchall() is a bit more spartan (=plain). pandas.read_sql_query returns a <class 'pandas.core.frame.DataFrame'> and so you can use all the methods of pandas.DataFrame, like pandas.DataFrame.to_latex, pandas.DataFrame.to_csv pandas.DataFrame.to_excel, etc. (documentation link)

One can accomplish the same exact goals with cursor.fetchall, but needs to press some or a lot extra keys.

Konstantinos
  • 4,096
  • 3
  • 19
  • 28
  • Thanks you for your input. Is this considered to be a minor difference, or by definition, the use with cursore is better? – Liorb Feb 11 '22 at 06:25
  • @Liorb Yes, the performance differences virtually don't exist. However, I updated the answer: there are differences a bit further, in the easiness to further use the resulting data frame. – Konstantinos Feb 11 '22 at 13:40