0

I'm trying to optimize my queries by using ColdFusion's Query of Queries feature to access a cached query of about 45,000 words.

With this below query I had lots of success in speed switching to QoQ:

<cfquery name="FindAnagrams" dbtype="query" >
    SELECT AllWords.Word, AllWords.AnagramKey
    FROM AllWords
    WHERE AllWords.WordLength = #i#
</cfquery>

Executions went from ~400ms to ~15ms.

This below query however was only slightly reduced in execution time (from ~500ms to ~400ms):

 <cfquery name="TopStartWith" dbtype="query" maxrows="15">
    SELECT AllWords.Word
    FROM AllWords
    WHERE AllWords.Word LIKE <cfoutput>'#Word#%' </cfoutput> 
    AND AllWords.Word <> '#Word#'
    ORDER BY AllWords.Frequency DESC;
  </cfquery> 

Removing 'Maxrows' did not really help. My Database fields are indexed and I'm ad the end of my knowledge of optimizing queries (Can you index a column of a CF QoQ object?) I suspect it is the 'ORDER BY' that is causing the delay, but am unsure. How can I further improve the speed of such queries? Many thanks.

kbtbc
  • 1
  • 1
  • 2
    Where did ` AllWords.Frequency` come from? Also you don't need `` in a ``. You might get better performance if you used `` – James A Mohler Jun 17 '14 at 19:53
  • Thank you. AllWords.Frequency is a field in the cached query -- I do apologize for inexperience (I only know CF from years ago, with a sudden interest to develop a word learning app for my word game playing; see [link](http://bestwordplay.com). I will test with . – kbtbc Jun 17 '14 at 20:12
  • Update: I removed and tested with -- the results are negligible. – kbtbc Jun 17 '14 at 20:32
  • QofQ is slow with large recordsets. Bearing in mind that array functions work on the columns and that you also have a valuelist available, list or array functions might be a better approach. – Dan Bracuk Jun 17 '14 at 21:44
  • 4
    I'd put the data back in the DB where it belongs, and do the querying there where you can index it properly. That's what a DB is for. – Adam Cameron Jun 17 '14 at 21:47
  • The other argument in favour of simply running db queries is that you are always querying the latest data. That is not necessarily the case with cached queries. – Dan Bracuk Jun 17 '14 at 23:35
  • Thank you. The data is this case is quite static, it's the queries that get very dynamic -- I'm basically searching words in a resulting word list for letter patterns and matching other words with similar cont. The query that I am caching through CF is a full list of words in the db, with some word stats. Originally I was running these same queries directly from the db, and caching did offer a boost, so I'm not sure how going back to the db will offer performance. Perhaps there is a faster approach with array functions. I will continue to test and explore options from all feedback. – kbtbc Jun 18 '14 at 03:25
  • 1
    Bottom line, QoQ cannot really be optimized beyond what you have done. I would hazard a guess that a query run directly against the database would perform better than QoQ. As Adam pointed out, that is what databases do. You could also look into full text search, or dump these items into a Solr collection, might actually give you better results than a `LIKE` query. – Scott Stroz Jun 18 '14 at 03:40

2 Answers2

2

For optimizing the second query, there are a couple of approaches you could take.

Firstly, see if your database supports something like function-based indexes (an oracle term, but it is available in other platforms). See this for a mySQL example: Is it possible to have function-based index in MySQL?

Secondly, you could pre-process your words into a structure which supports the query you're after. I'm assuming you're currently loading the query into application or session scope elsewhere. When you do that you could also process the words into a structure like:

{
    'tha':['thames','that'],
    'the':['them','then','there'],
    //etc
}

Instead of running a QoQ, you get the first 3 letters of the word, look up the array, then iterate over it, finding matches. Essentially, it's pretty similar to what a function-based index is doing, but in code. You're trading memory for speed, but with on 45000 words, the structure isn't going to be enormous.

Community
  • 1
  • 1
barnyr
  • 5,678
  • 21
  • 28
0

The LIKE clause probably causes the poor performance of your second query. You can see a similar performance penalty if you use LIKE in a regular database query. Since LIKE performs a wildcard search against the entire string stored in the database column, it can't just do an EQUALS comparison.

Carl Von Stetten
  • 1,149
  • 8
  • 13