-3

I'm using python to generate reports from a mysql database. The report retrieves data from a single table. There is an index on f_id. In the report I'm filtering by various combinations of col1 - col5 and displaying the count of records. Simplified it looks something like this:

def get_query_result(f_id, col1, col2, col3, col4, col5):
    mycursor = mydb.cursor()
    sql =  ("SELECT COUNT(*) "
            "FROM mytable "
            "WHERE f_id = %s " 
            "AND col1 = %s " 
            "AND col2 = %s " 
            "AND col3 = %s " 
            "AND col4 = %s " 
            "AND col5 = %s " 
            )
    val = (f_id, col1, col2, col3, col4, col5)
    mycursor.execute(sql, val)
    myresult = mycursor.fetchall()
    return myresult[0][0]

print(get_query_result(123, 'a', 'a', 'a', 'x', 1))
print(get_query_result(123, 'a', 'b', 'a', 'x', 1))
print(get_query_result(123, 'a', 'a', 'b', 'x', 1))
print(get_query_result(123, 'a', 'a', 'a', 'z', 1))
print(get_query_result(123, 'a', 'a', 'a', 'z', 2))
print(get_query_result(123, 'a', 'a', 'a', 'z', 3))
print(get_query_result(123, 'a', 'a', 'a', 'z', 4))
...

There are about 100 numbers to calculate which results in 100 individual queries. There is no particular pattern in the combination of values.

I'm thinking there should be a better way than generating 100 unique sql queries and instead query the database once and then perform the calculations in memory.

I was looking into pandas but just importing pandas takes 1 second which is way to slow.

ColdFusion used to have a neat feature where you could query the database once and then could further query against that query result using regular SQL.

uwe
  • 3,938
  • 11
  • 37
  • 50
  • 4
    You need to provide more context and show your code for possible suggestions on the way it could be improved. The table structure matters too - if the WHERE clause is applied to unindexed fields this would result in a full table scan which can be slow depending on the size of the table. – Kate Nov 08 '21 at 16:43
  • after doing some more research and looking into mysql MEMORY tables, this post https://stackoverflow.com/questions/61639493/proper-use-of-mysql-memory-tables-for-performance pointed out that given a large enough innodb_buffer_pool_size mysql does a great job of internally caching queries to memory. Since my report with 100 queries runs in ~100ms it's likely making great use of caching and I shouldn't have to worry about explicitly writing code to cache the data. – uwe Nov 09 '21 at 16:21

2 Answers2

1

If you need to repeat a query a hundred times, then your intuition is right, there should be a better way.

What I would do in your situation is to tweak the SQL with GROUP BY, to have an aggregate table with precalculated counts eg:

SELECT f_id, col1, col2, col3, col4, col5, COUNT(*) as cnt
FROM mytable
GROUP BY f_id, col1, col2, col3, col4, col5

Then once you have loaded that list of results to do the equivalent of

print(get_query_result(123, 'a', 'a', 'a', 'x', 1))

you could use a list comprehension like:

[item["cnt"] for item in myresult
    if item["fid"] == 123
    and item["col1"] == 'a'
    and item["col2"] == 'a'
    and item["col3"] == 'a'
    and item["col4"] == 'x'
    and item["col5"] == 1
]

(untested approximation but you get the idea) The idea is to filter your list to locate the "row" that fulfills all search criteria.

In order to fetch the column names I think you need something like cursor = connection.cursor(dictionary=True) for your Mysql cursor (please see here if this is the library you are using). This would be more convenient and less error-prone than using list indexes.

Kate
  • 1,809
  • 1
  • 8
  • 7
0

If you run the query often, add INDEX(f_id, col1, col2, col3, col4, col5)

Rick James
  • 135,179
  • 13
  • 127
  • 222