0

I've got a slow MySQL SELECT query, that I can't seem to troubleshoot.

It's a simple one, on a table with about 600,000 records.

SELECT * 
FROM  `civicrm_contact` contact
WHERE contact.external_identifier =123456

The Select query takes anywhere between 3-6 seconds, which make importing another 600,000 records that depend on this query, completely impractical.

The table indexes are shown in attached image:Table Indexes

If I search based on contact.id=123456 then the query time is down to about 0.004s. contact.id is the primary key on the table. external_identifier is a unique index.

bpmccain
  • 600
  • 4
  • 12
  • 25
  • 5
    do you absolutely need to `select *`? – nico Apr 23 '12 at 16:23
  • No - I just tested it by limiting it to SELECT id but the query is still at between 1.5 and 2 seconds. So it is better, but still too slow. – bpmccain Apr 23 '12 at 16:29
  • I'm running it on a 7.5GB Amazon EC2 Large Instance right now. The total DB size is about 1GB, so there is plenty of RAM. – bpmccain Apr 23 '12 at 16:33
  • if you have a unique numeric identifier, this it should probably be the `PRIMARY KEY` .. it seems that you have added the `id` column just *"because that's what i always do"*. – tereško Apr 23 '12 at 16:35
  • I'm using CiviCRM which creates its own id column with each new record. The external_identifier is used to sync those records with data coming from third party sources which may have different identifiers. I haven't added any columns to those that come as part of CiviCRM, and I don't want to get into changing Primary Keys at that would impact how the whole database operates. – bpmccain Apr 23 '12 at 16:39
  • Ensure you've configured MySQL to utilize the RAM. If you need further help, please post schema, EXPLAIN result, and SHOW PROFILE result for your improved query (without SELECT *). – Marcus Adams Apr 23 '12 at 16:40
  • 1
    What engine is this table using? Have you tried [`REPAIR TABLE`](http://dev.mysql.com/doc/refman/5.6/en/repair-table.html)? – eggyal Apr 23 '12 at 16:41
  • I changed external_identifier to INT (10) instead of VARCHAR. Query time is now 0.006s. Don't know broader implications of that change yet, but it seems like a good start. Will post as solution once my time limit on answering own questions expires. Thanks – bpmccain Apr 23 '12 at 17:03

4 Answers4

1

I know this is an old thread but as it relates to CiviCRM I thought I'd push out my thoughts. The fix is actually not best practise as you've altered one of the core packaged tables to get your query running faster. Although this may be ok for you I definitely would not recommend this to everyone.

Your solution has probably highlighted the problem with the query though, it seems you're telling the query that your expecting a number but in actually fact the data is stored as a VARCHAR. So I think simply putting single quotes around the value would have done the trick?

SELECT * FROM civicrm_contact contact WHERE contact.external_identifier = '123456'

Without this I'm pretty sure (having worked with Oracle for a number of years) that an implicit data type conversion would take place therefore the query is then not able to use the INDEX.

An explain plan should prove this theory.

Thanks

Parvez

Parvez Saleh
  • 136
  • 4
0

It seems you use a BTREE index. If you don't perform any range queries on this column (using <, >, <= or >=), you may want to use a hash based index.

See Comparison of B-Tree and Hash Indexes for detailed information.

And see here exact syntax.

yair
  • 8,945
  • 4
  • 31
  • 50
  • The `PRIMARY KEY` index on the `id` column is also a `BTREE` though, so this doesn't explain the performance difference. – eggyal Apr 23 '12 at 16:46
0

I changed the structure to make external_identifier of type INT instead of VARCHAR. Speed has increased to 0.006s

I'm not sure yet whether or not this will have any broader implications

bpmccain
  • 600
  • 4
  • 12
  • 25
0

I'm dubious of the data type conversion being the issue. I wonder if it's about the size limit of the indexed field. Being a btree means the index keys are potentially much bigger than a hash key would be. Is this necessary? Would it be better to store the external ids in a separate table, and link them based on a numeric id?

More questions than answers here really.

mc0e
  • 2,699
  • 28
  • 25