17

We're having some odd issues with MySQL inner joins. Basically, we get an odd error when using an '=' operator but using 'like' instead makes it work. Unfortunately, this is via ActiveRecord and no easy way to just slap 'like' in there instead, plus we want to understand what's actually happening here.

Here is the query that fails:

mysql> SELECT COUNT(*) FROM `versions` INNER JOIN `site_versions`
              ON `versions`.id = `site_versions`.version_id;

Here is the error:

ERROR 1296 (HY000): Got error 20008 'Query aborted due to out of query memory'
from NDBCLUSTER

Here is the query that works:

mysql> SELECT COUNT(*) FROM `versions` INNER JOIN `site_versions`
              ON `versions`.id like `site_versions`.version_id;

Here are some details on the tables themselves:

mysql> desc site_versions;
+----------------------+----------+------+-----+---------+----------------+
| Field                | Type     | Null | Key | Default | Extra          |
+----------------------+----------+------+-----+---------+----------------+
| id                   | int(11)  | NO   | PRI | NULL    | auto_increment |
| version_id           | int(11)  | YES  | MUL | NULL    |                |
[..snip..]
+----------------------+----------+------+-----+---------+----------------+

mysql> desc versions;
+------------+--------------+------+-----+---------+----------------+
| Field      | Type         | Null | Key | Default | Extra          |
+------------+--------------+------+-----+---------+----------------+
| id         | int(11)      | NO   | PRI | NULL    | auto_increment |
[..snip..]
+------------+--------------+------+-----+---------+----------------+

Any ideas why the 'like' works and the '=' does not?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Eric Anderson
  • 1,047
  • 1
  • 9
  • 13
  • 4
    is the like actually giving results you expect? maybe it's just failing to join anything (because what does like mean for ints?) and so avoiding the memory problem that the valid query is triggering. – andrew cooke May 22 '12 at 03:15
  • 1
    @andrewcooke - Hmm, `like` does seem to work when used that way: http://sqlfiddle.com/#!2/86792/1 – Paul Bellora May 22 '12 at 03:26
  • 3
    It might also help the question to show the `explain` of each query. – Paul Bellora May 22 '12 at 03:27
  • 2
    I think you need to include info on the database engine for the tables. Strings comparison differs greatly between the different options. – Mahmoud Al-Qudsi May 22 '12 at 05:02
  • @andrewcooke Yes it does return valid results. – Eric Anderson May 22 '12 at 14:33
  • This is using NDB/CLUSTER database engine. I don't have the explain, but I can probably dig it up. – Eric Anderson May 22 '12 at 14:35
  • what does it (the like) mean? does it coerce the ints to strings before comparing? is that breaking indexing? i guess this is why people are asking for explains? – andrew cooke May 22 '12 at 14:49
  • `LIKE` forces `site_versions` to be leading. Could you please run `SELECT COUNT(*) FROM site_versions STRAIGHT_JOIN versions ON versions.id = site_versions.version_id`? – Quassnoi May 22 '12 at 18:09
  • 4
    It seems to me that the query you have itsn't that informative it should only give you the count of site_versions where the version_id is not null, so there is no need for the join. Could you explain what you want know from the data? – Robert May 23 '12 at 07:42

3 Answers3

3

Ironically this seems to be related to optimizations; by using LIKE you force MySQL to stop using possible indexes (at least with number columns, because it has to cast them all to string for the comparison).

So by using = it seems that MySQL simply runs out of space (memory / disc) to use the index (check the key_buffer setting).

Of course, this is just a hunch and I don't know much about NDB to help you further, but I hope this moves you in the right direction.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
1

I can't imagine how like works well in this situation, but how about just trying to check out the value of variable MaxAllocate and increase it? Its default value is 32M and can be increased up to 1G.

Reference : http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-maxallocate

lqez
  • 2,898
  • 5
  • 25
  • 55
0

Not sure if this will fix your problem, but it's possible. Don't use count(*). Maybe it's getting 'confused' about what to count. It's best-practice to count a specific field e.g. id. To do this, you need to use an alias for the tables.

SELECT COUNT(v.id) FROM versions as v
INNER JOIN site_versions as sv ON v.id = sv.version_id;
bfavaretto
  • 71,580
  • 16
  • 111
  • 150
seanmrafferty
  • 333
  • 2
  • 11
  • 1
    Or just use a constant: `SELECT COUNT(1) FROM ...` – bjnord May 22 '12 at 21:10
  • 4
    In an unusual situation like the OP's one must not be too much surprised at suggestions like yours. And I won't be. But this statement, *‘It's best-practice to count a specific field’*, made in the context of the other one, namely *‘Don't use `count(*)`’*, sounds like a general advice for the future and as such is arguable, to say the least. Using `COUNT(*)` to count rows in a group regardless of the data in the rows or of any other condition is perfectly valid and according to the standard. Maybe there are some implications to it in MySQL, but then you might have to mention that it was so. – Andriy M May 23 '12 at 11:08
  • @Andriy - I also always assumed `count(*)` should be avoided when possible? Although the syntax is valid and maybe not related to op's issue, there is indeed [a notable difference performance wise](http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/). I am not DBA though, but as developper have always avoided `count(*)` when possible because of this. I may have misunderstood something here, please clue me in if I did. – Stephane Gosselin May 29 '12 at 08:03
  • @stefgosselin there's a difference alright; in the link you posted `count(*)` outperforms a `count()` on a nullable column :) – Ja͢ck May 29 '12 at 08:20
  • @stefgosselin: Well, the article does demonstrate the difference in performance between `count(*)` and `count(col)` and the difference seems to be in favour of `count(*)`, unless it is me who is missing something. So, as a developer, you might want to be interested in using more efficient constructs, and therefore my question is, why avoid `count(*)`? – Andriy M May 29 '12 at 08:20
  • @jack @Andriy - Yes, I realise after re-reading closely *my own link* count(*) is indeed faster... . I had always used `count()` in the same manner as `select()`, avoiding when possible the `*`. Funny thing is, after all those years in 3 different dev teams and many code reviews, I am amazed this was never caught, though to be fair I did a lot of frontend stuff. Thanks guys, love this site, really appreciate your replies. – Stephane Gosselin May 29 '12 at 08:52