0

I'm using pypyodbc and pandas.read_sql_query to query a cloud stored MS Access Database .accdb file.

def query_data(group_id,dbname = r'\\cloudservername\myfile.accdb',table_names=['ContainerData']):

        start_time = datetime.now()
        print(start_time)
        pypyodbc.lowercase = False
        conn = pypyodbc.connect(
            r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};"+
            r"DBQ=" + dbname + r";")
        connection_time = datetime.now()-start_time
        print("Connection Time: " + str(connection_time))

        querystring = ("SELECT TOP 10 Column1, Column2, Column3, Column4 FROM " + 
            table_names[0] + " WHERE Column0 = " + group_id)

        my_data = pd.read_sql_query(querystring,conn)

        print("Query Time: " + str(datetime.now()-start_time-connection_time))

        conn.close()

        return(my_data)

The database has about 30,000 rows. The group_id are sequential numbers from 1 to 3000 with 10 rows assigned to each group. For example, rows 1-10 in the database (oldest date) all have group_id=1. Rows 2990-3000 (newest data) all have group_id = 3000.

When I store the database locally on my PC and run query_data('1') the connection time is 0.1s and the query time is 0.01s. Similarly, running query_data('3000') the connection time is 0.2s and the query time is 0.08s.

When the database is stored on the cloud server, the connection time varies from 20-60s. When I run query_data('1') the query time is ~3 seconds. NOW THE BIG ISSUE: When I run query_data('3000') the query time i ~10 minutes!

I've tried using ORDER BY group_id DESC but that causes both queries to take ~ 10 minutes.

I've also tried changing the "Order by" group_id to Descending in the accdb itself and setting "Order by on load" to yes. Neither of these seem to change how the SQL query locates the data.

The problem is, the code I'm using almost always needs to find the newest data (e.g. group_id = max) which takes the longest amount of time to find. Is there a way to have the SQL query reverse it's searching order, so that the newest entries are looked through first, rather than the oldest entries? I wouldn't mind a 3 second (or even 1 minute) query time, but a 10 minute query time is too long. Or is there a setting I can change in the access database to change the order in which the data is stored?

I've also watched the network monitor while running the script, and python.exe steadily sends about 2kb/s and receives about 25kb/s throughout the full 10 minute duration of the script.

Jamie
  • 11
  • 5
  • 2
    This sounds like a terrible plan. Access requires to either be on a local disk or SMB share (and on a SMB share, it requires a cabled LAN connection), and it sounds like your cloud host is neither which murders performance and can cause corruption. You can mess with the indexes to store new data at the top, but that will cause massive fragmentation and is generally terrible. The only real fix here is use a supported way to host your database. – Erik A Apr 29 '20 at 16:02
  • I'm not sure what you mean by it requires to be on a local disk or SMB share. A cloud server is a local disk, just stored somewhere else. The database front end and back end are stored on the same cloud drive and accessed through a citrix receiver. It seems to work quite well. The only problem is the slow query speed when working remotely. If the python script was run on the same virtual machine, would that help? – Jamie Apr 30 '20 at 14:16
  • Nope, a cloud server is not just a local disk stored elsewhere. Depending on the protocol used, you can't do partial file reads and writes, locking, etc. You can run the code on the machine that has direct access to the disk (or SMB access over a LAN connection, or perhaps an iSCSI share), but if you don't have that, you're in for lots of trouble, especially if you plan to modify the file. Note that even using SMB, Access has recently faced a bug that caused widespread corruption over network shares, which is only fixed in the most recent versions. – Erik A Apr 30 '20 at 14:32

0 Answers0