I'm using pypyodbc and pandas.read_sql_query to query a cloud stored MS Access Database .accdb file.
def query_data(group_id,dbname = r'\\cloudservername\myfile.accdb',table_names=['ContainerData']):
start_time = datetime.now()
print(start_time)
pypyodbc.lowercase = False
conn = pypyodbc.connect(
r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};"+
r"DBQ=" + dbname + r";")
connection_time = datetime.now()-start_time
print("Connection Time: " + str(connection_time))
querystring = ("SELECT TOP 10 Column1, Column2, Column3, Column4 FROM " +
table_names[0] + " WHERE Column0 = " + group_id)
my_data = pd.read_sql_query(querystring,conn)
print("Query Time: " + str(datetime.now()-start_time-connection_time))
conn.close()
return(my_data)
The database has about 30,000 rows. The group_id are sequential numbers from 1 to 3000 with 10 rows assigned to each group. For example, rows 1-10 in the database (oldest date) all have group_id=1. Rows 2990-3000 (newest data) all have group_id = 3000.
When I store the database locally on my PC and run query_data('1')
the connection time is 0.1s and the query time is 0.01s. Similarly, running query_data('3000')
the connection time is 0.2s and the query time is 0.08s.
When the database is stored on the cloud server, the connection time varies from 20-60s. When I run query_data('1')
the query time is ~3 seconds. NOW THE BIG ISSUE: When I run query_data('3000')
the query time i ~10 minutes!
I've tried using ORDER BY group_id DESC
but that causes both queries to take ~ 10 minutes.
I've also tried changing the "Order by" group_id to Descending in the accdb itself and setting "Order by on load" to yes. Neither of these seem to change how the SQL query locates the data.
The problem is, the code I'm using almost always needs to find the newest data (e.g. group_id = max) which takes the longest amount of time to find. Is there a way to have the SQL query reverse it's searching order, so that the newest entries are looked through first, rather than the oldest entries? I wouldn't mind a 3 second (or even 1 minute) query time, but a 10 minute query time is too long. Or is there a setting I can change in the access database to change the order in which the data is stored?
I've also watched the network monitor while running the script, and python.exe steadily sends about 2kb/s and receives about 25kb/s throughout the full 10 minute duration of the script.