How to generate report with fewer sql queries?

Question

I'm using python to generate reports from a mysql database. The report retrieves data from a single table. There is an index on f_id. In the report I'm filtering by various combinations of col1 - col5 and displaying the count of records. Simplified it looks something like this:

def get_query_result(f_id, col1, col2, col3, col4, col5):
    mycursor = mydb.cursor()
    sql =  ("SELECT COUNT(*) "
            "FROM mytable "
            "WHERE f_id = %s " 
            "AND col1 = %s " 
            "AND col2 = %s " 
            "AND col3 = %s " 
            "AND col4 = %s " 
            "AND col5 = %s " 
            )
    val = (f_id, col1, col2, col3, col4, col5)
    mycursor.execute(sql, val)
    myresult = mycursor.fetchall()
    return myresult[0][0]

print(get_query_result(123, 'a', 'a', 'a', 'x', 1))
print(get_query_result(123, 'a', 'b', 'a', 'x', 1))
print(get_query_result(123, 'a', 'a', 'b', 'x', 1))
print(get_query_result(123, 'a', 'a', 'a', 'z', 1))
print(get_query_result(123, 'a', 'a', 'a', 'z', 2))
print(get_query_result(123, 'a', 'a', 'a', 'z', 3))
print(get_query_result(123, 'a', 'a', 'a', 'z', 4))
...

There are about 100 numbers to calculate which results in 100 individual queries. There is no particular pattern in the combination of values.

I'm thinking there should be a better way than generating 100 unique sql queries and instead query the database once and then perform the calculations in memory.

I was looking into pandas but just importing pandas takes 1 second which is way to slow.

ColdFusion used to have a neat feature where you could query the database once and then could further query against that query result using regular SQL.

You need to provide more context and show your code for possible suggestions on the way it could be improved. The table structure matters too - if the WHERE clause is applied to unindexed fields this would result in a full table scan which can be slow depending on the size of the table. — Kate, Nov 08 '21 at 16:43
after doing some more research and looking into mysql MEMORY tables, this post https://stackoverflow.com/questions/61639493/proper-use-of-mysql-memory-tables-for-performance pointed out that given a large enough innodb_buffer_pool_size mysql does a great job of internally caching queries to memory. Since my report with 100 queries runs in ~100ms it's likely making great use of caching and I shouldn't have to worry about explicitly writing code to cache the data. — uwe, Nov 09 '21 at 16:21

score 1 · Answer 1 · answered Nov 08 '21 at 21:25

If you need to repeat a query a hundred times, then your intuition is right, there should be a better way.

What I would do in your situation is to tweak the SQL with GROUP BY, to have an aggregate table with precalculated counts eg:

SELECT f_id, col1, col2, col3, col4, col5, COUNT(*) as cnt
FROM mytable
GROUP BY f_id, col1, col2, col3, col4, col5

Then once you have loaded that list of results to do the equivalent of

print(get_query_result(123, 'a', 'a', 'a', 'x', 1))

you could use a list comprehension like:

[item["cnt"] for item in myresult
    if item["fid"] == 123
    and item["col1"] == 'a'
    and item["col2"] == 'a'
    and item["col3"] == 'a'
    and item["col4"] == 'x'
    and item["col5"] == 1
]

(untested approximation but you get the idea) The idea is to filter your list to locate the "row" that fulfills all search criteria.

In order to fetch the column names I think you need something like cursor = connection.cursor(dictionary=True) for your Mysql cursor (please see here if this is the library you are using). This would be more convenient and less error-prone than using list indexes.

score 0 · Answer 2 · answered Nov 10 '21 at 01:52

0

If you run the query often, add INDEX(f_id, col1, col2, col3, col4, col5)

answered Nov 10 '21 at 01:52

Rick James

135,179
13
127
222

How to generate report with fewer sql queries?

2 Answers2