10

This is a weird one. I am trying to use Views in MySQL (I'm reasonably new to MySQL with more experience with Sybase and SQL Server). Any way this new project we are using MySQL as it seems to have good performance. However to make querying for a web front end simpler we decided to create a few views, all work well, but they take forever to run.

The views are very simple, just select statements (these tables do have a few million rows in them). Say for example this query:

SELECT CAST(classifier_results.msgDate as DATE) AS mdate
       ,classifier_results.objClass AS objClass
       ,COUNT(classifier_results.objClass) AS obj
       ,classifier_results.subjClass AS subjClass
       ,COUNT(classifier_results.subjClass) AS subj 
FROM classifier_results 
WHERE (classifier_results.msgDate >= (curdate() - 20)) 
GROUP BY 
  CAST(classifier_results.msgDate as DATE)
  ,classifier_results.objClass
  ,classifier_results.subjClass 
ORDER BY classifier_results.msgDate DESC

When run as a normal select takes around 1.5 seconds to return a result.

However when this query is put into a view (as is) - i.e.

CREATE VIEW  V1a_sentiment_AI_current AS    
SELECT CAST(classifier_results.msgDate as DATE) AS mdate
       ,classifier_results.objClass AS objClass
       ,COUNT(classifier_results.objClass) AS obj
       ,classifier_results.subjClass AS subjClass
       ,COUNT(classifier_results.subjClass) AS subj 
FROM classifier_results 
WHERE (classifier_results.msgDate >= (curdate() - 20)) 
GROUP BY 
  CAST(classifier_results.msgDate as DATE)
  ,classifier_results.objClass
  ,classifier_results.subjClass 
ORDER BY classifier_results.msgDate DESC

The query takes about 10 times longer (22-30 seconds). So I'm thinking maybe there is some optimization or query caching that doesnt work with Views or maybe there is some setting we've missed in the MySQL config. But is there any way to speed up this view so its just a nice placeholder for this query?

Running EXPLAIN on the two queries: The normal select gives:

1, SIMPLE, classifier_results, ALL, idx_date, , , , 594845, Using where; Using temporary; Using filesort

The view select gives:

1, PRIMARY, , ALL, , , , , 100,
2, DERIVED, classifier_results, ALL, idx_date, , , , 594845, Using where; Using temporary; Using filesort

Johan
  • 74,508
  • 24
  • 191
  • 319
NightWolf
  • 7,694
  • 9
  • 74
  • 121
  • If you use EXPLAIN for both the query and selecting from the view, do you get different results? – Cocowalla Sep 24 '11 at 04:04
  • Added to the question. The query plan looks the same, im assuming th eprimary is just a return from the view as it is nested in a sense, nothing that would indicate 20seconds+ extra to run.... – NightWolf Sep 24 '11 at 04:08
  • 1
    I think `DERIVED` means it is using a temporary table, which is killing performance – Cocowalla Sep 24 '11 at 04:13
  • 4
    I think it looks fine. But so you chill I've remove the tildas. Happy now? Anything productive? – NightWolf Sep 24 '11 at 04:31
  • I seem to remember that MySQL VIEWs are not as optimized as you may find in other engines. A quick google search led me to this article about [MySQL VIEW as performance troublemaker](http://www.percona.com/blog/2007/08/12/mysql-view-as-performance-troublemaker/) – wes.hysell Nov 22 '14 at 18:50

3 Answers3

2

This is a really common problem. It can be very hard to write DRY, re-usable SQL. There is a workaround I've found though.

Firstly, as others have pointed out, you can and should use VIEWs to do this wherever possible using the set ALGORITHM = MERGE, so that any queries using them are optimised on the merged SQL statement's where clause rather than having the VIEW evaluated for the entire view which can be catastrophically large.

In this case, since you cannot use MERGE because of the group/count aspect, you might want to try using a stored procedure that creates a temporary session table as a workaround.

This technique allows you to write reusable queries that can be accessed from middleware / framework code and called from inside other stored procedures, so you can keep code contained, maintainable and reusable.

I.e. if you know in advance that the query will be filtered on certain conditions, put those in a stored procedure. (It may be more efficient to post-filter the data set, or a combination - it depends how you use the data and what common sets are needed).

CREATE PROCEDURE sp_create_tmp_V1a_sentiment_AI_current(parm1, parm2 etc)
BEGIN

  drop temporary table if exists tmp_V1a_sentiment_AI_current;

  create temporary table tmp_V1a_sentiment_AI_current
  as
  SELECT CAST(classifier_results.msgDate as DATE) AS mdate
         ,classifier_results.objClass AS objClass
         ,COUNT(classifier_results.objClass) AS obj
         ,classifier_results.subjClass AS subjClass
         ,COUNT(classifier_results.subjClass) AS subj 
  FROM classifier_results 
  WHERE (classifier_results.msgDate >= (curdate() - 20)) 
  -- and/or other filters on parm1, parm2 passed in
  GROUP BY 
    CAST(classifier_results.msgDate as DATE)
    ,classifier_results.objClass
    ,classifier_results.subjClass 
  ORDER BY classifier_results.msgDate DESC;

END;

Now, any time you need to work with this data, you call the procedure and then either select the result (possibly with additional where clause parameters) or join with it in any other query.

The table is a session temporary table so it will persist beyond the call to the procedure. The calling code can either drop it once it's finished with the data or it'll go automatically when the session goes or a subsequent call to the sproc is made.

Hope that's helpful.

Gruff
  • 524
  • 5
  • 13
2

Try re-creating your view using this:

CREATE ALGORITHM = MERGE VIEW `V1a_sentiment_AI_current` AS    
SELECT CAST(classifier_results.msgDate as DATE) AS mdate
   ,classifier_results.objClass AS objClass
   ,COUNT(classifier_results.objClass) AS obj
   ,classifier_results.subjClass AS subjClass
   ,COUNT(classifier_results.subjClass) AS subj 
FROM classifier_results 
WHERE (classifier_results.msgDate >= (curdate() - 20)) 
GROUP BY 
  CAST(classifier_results.msgDate as DATE)
  ,classifier_results.objClass
  ,classifier_results.subjClass 
ORDER BY classifier_results.msgDate DESC

More information on MySQL's view processing algorithms can be found here.

Johan
  • 74,508
  • 24
  • 191
  • 319
Cocowalla
  • 13,822
  • 6
  • 66
  • 112
  • Ok thanks for this link. The view was UNDEFINED before. By trying to change the algo to MERGE gives: 0 row(s) affected, 1 warning(s): 1354 View merge algorithm can't be used here for now (assumed undefined algorithm). Trying to create the view with the TEMPTABLE algo works fine. So with UNDEFINED im guessing it might be using TempTable as undefined seems to make a choice between merge and temptable. So this may be the issue, as the manual says merge is more efficient... – NightWolf Sep 24 '11 at 04:19
  • Try removing the `ORDER BY` clause from the view and see if it will work with the `MERGE` algorithm – Cocowalla Sep 24 '11 at 04:22
  • Removing the `ORDER BY` still doesnt want to work as merge. `1354 View merge algorithm can't be used here for now (assumed undefined algorithm)` – NightWolf Sep 24 '11 at 04:24
  • Sorry, I'm out of ideas :( MySQL is saying it can't use the `MERGE` algo, but I'm not sure why - hopefully someone else can help out. – Cocowalla Sep 24 '11 at 04:29
  • Learned a lot from both question and answer/comments, thanks - in the link provided in the answer it says that the `MERGE` algo cannot work with count in the select part of the view. – miherrma Sep 24 '11 at 05:00
  • It would appear that thats the case. SO is there any work around to have a view query using count run fast? There is this but they dont have a work around for the count issue. http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-performance-problems-of-temptable-views/ – NightWolf Sep 24 '11 at 09:23
  • I found that removing all aggregation functions and the `GROUP BY` from the view will allow performance to return to normal. This means each of your queries using the view will require `GROUP BY`, but at least it helps manage some complexity (especially if you have `JOINS` in the view). – adamlamar Dec 24 '12 at 16:11
  • `MERGE` algorithm is not allowed when using `COUNT()` `SUM()` and so on! – funder7 Dec 06 '20 at 20:05
0

MERGE can't be used here because of the count() aggregates in the select list; it might help in these cases to specify TEMPTABLE to save the engine from having to decide between them. Once I'd decided which algorithm to use I'd look at the EXPLAIN plan and try to add an index hint or locate a missing index.

j_d_b
  • 638
  • 5
  • 11