Slow loading SQL Server table into pandas DataFrame

Question

Pandas gets ridiculously slow when loading more than 10 million records from a SQL Server DB using pyodbc and mainly the function pandas.read_sql(query,pyodbc_conn). The following code takes up to 40-45 minutes to load 10-15 million records from SQL table: Table1

Is there a better and faster method to read SQL Table into pandas Dataframe?

import pyodbc
import pandas

server = <server_ip> 
database = <db_name> 
username = <db_user> 
password = <password> 
port='1443'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = conn.cursor()

data = pandas.read_sql("select * from Table1", conn) #Takes about 40-45 minutes to complete

Does `rows = cursor.execute("select * from Table1").fetchall()` take a similar amount of time? — Gord Thompson, Nov 20 '18 at 19:57
@W-B chunk does not help with the time issue. Still takes a lot of time to read. — Anjana Shivangi, Nov 26 '18 at 17:17
@GordThompson Thank you. I tried using execute() and fetchall() takes decent amount of time to read the pyodbc cursor object but takes forever to convert it into pandas Dataframe. Please see [link](https://stackoverflow.com/q/53486051/7994141) — Anjana Shivangi, Nov 26 '18 at 17:19

score 1 · Answer 1 · answered Feb 25 '19 at 21:21

I had a same problem with even more number of rows, ~50 M Ended up writing a SQL query and stored them as .h5 files.

sql_reader = pd.read_sql("select * from table_a", con, chunksize=10**5)

hdf_fn = '/path/to/result.h5'
hdf_key = 'my_huge_df'
store = pd.HDFStore(hdf_fn)
cols_to_index = [<LIST OF COLUMNS THAT WE WANT TO INDEX in HDF5 FILE>]

for chunk in sql_reader:
    store.append(hdf_key, chunk, data_columns=cols_to_index, index=False)

# index data columns in HDFStore
store.create_table_index(hdf_key, columns=cols_to_index, optlevel=9, kind='full')
store.close()

This way, we'll be able to read them faster than a Pandas.read_csv

Slow loading SQL Server table into pandas DataFrame

1 Answers1

Linked