15

MySQL Version: 5.5.37-0ubuntu0.14.04.1

I am currently writing a python script which makes use of numerous MySQL tables and queries to obtain results from an inverted index stored in tables.

I have noticed that choosing the right type of Cursor in the MySQLdb python module when executing a query has a really big effect on performance and was wondering if someone could explain or provide a reliable resource explaining which cursor to use when.

As an example, executing this query 40 times with SSCursor takes 7 seconds:

SELECT Pages.PageID,
       Pages.PageName,
       Counter AS TermFreq,
       Pages.Length,
       (Counter / LOG(Length)) AS Weight
FROM Pages
INNER JOIN TermOccurrences ON TermOccurrences.PageID = Pages.PageID
INNER JOIN Terms ON TermOccurrences.TermID = Terms.TermID
WHERE TermName = %s
ORDER BY Weight DESC
LIMIT 20;

Running the same query 40 times with the default Cursor takes 0.004 seconds.

Removing the calculation of weight (Counter/LOG(Length)) makes this query execute fast again using an SSCursor however.

I was using SSCursor because it was proving to have vastly superior performance on a number of other queries and then suddenly became very slow for this one. Changing back to the default Cursor surprised me when it executed so fast.

EDIT: Some more examples.

Running the following with the default cursor 40 times takes ~3 seconds:

SELECT COUNT(*)
FROM Pages
INNER JOIN TermOccurrences ON TermOccurrences.PageID = Pages.PageID
INNER JOIN Terms ON TermOccurrences.TermID = Terms.TermID
WHERE TermName = %s AND Counter > 2

Running it with SSCursor instead takes about 0.002 seconds.

Michael Aquilina
  • 5,352
  • 4
  • 33
  • 38
  • 3
    That's an excellent question, actually. Glancing at Python-MySQL's source, I couldn't find anything that'd make this that much slower... All I can think of is some sort of communication overhead or a MySQL bug that causes it to maybe recalculate the result set for each row fetched. – AKX Jul 17 '14 at 09:07
  • 1
    *How* did you measure performance differences? – Martijn Pieters Jul 24 '14 at 11:41
  • Im using a simple timer (`time` module in python) – Michael Aquilina Jul 24 '14 at 13:08
  • Can you check, please, whether it takes the same time for each one of the 40 iterations, or whether there is some startup overhead. – mdurant Jul 24 '14 at 20:40
  • If you are referring to the role of the Key Cache, it does play some part, there is still noticeable difference between the different uses of cursors across different queries. Is there some authoritative resource about the differences between the different types of cursors available? – Michael Aquilina Jul 25 '14 at 07:05
  • You state that in the first example the SSCursor is slower and in the second example that the default cursor is slower. Is that really happening? I guess you should either clarify or correct that, possibly with some precise measurements and code. – the Jul 27 '14 at 18:24
  • @MichaelAquilina did you see the above comment? You state that SSCursor is way slower than the default cursor and for the second query it's the exact opposite. Is that correct? – the Jul 30 '14 at 10:42
  • Hi @KasperSouren, sorry I've been away which didn't allow me to test this. I may have figured out the problem as it may more have been an issue with cache after all. Will run some tests and get back to you with some results! – Michael Aquilina Jul 30 '14 at 11:12

1 Answers1

9

The MySQLdb docs mention that the standard Cursor class is using mysql_store_result() whereas SSCursor is using mysql_use_result() "with the latter you must ensure all the rows have been read before another query can be executed".

So it's about mysql_store_result() vs mysql_use_result().

The MySQL docs mention that mysql_use_result() initiates a result set retrieval without actually reading the result set into the client, like mysql_store_result() does. So each row has to be retrieved individually with calls to mysql_fetch_row(), which of course can add up considerably when dealing with large tables.

Also in the MySQLdb docs:

SSCursor: A "server-side" cursor. Like Cursor but uses CursorUseResultMixIn. Use only if you are dealing with potentially large result sets.

So SSCursor is mainly good if your result set is too large to move into your client all at once.

See also these questions:

And note that a LIMIT 20 query can never really be that large. You might have to check your KEYs. To get a better idea about why that could take 7 seconds it's probably best to also include the db schema in a question, probably more something for DBA stack.

Community
  • 1
  • 1
the
  • 21,007
  • 11
  • 68
  • 101