2

I have a table TEST with two columns:

  • A varchar(250)
  • B tinyint(1)

The table has about 4 million rows. A contains UTF8 strings, B can only be 0 or 1.

select count(1) from TEST is very fast (as of MySQL Workbench 0,000 sec), but select count(1) from TEST where B=1 takes about 15 seconds (on a quite fast machine, but on a real table with more columns that should not matter for this problem). Adding an index for B did not help - it still makes a full table scan. Forcing the index usage did not help neither.

The storage engine is MyISAM and because there are much, much more selects than inserts/updates, this is probably the best choice.

How can this query be speeded up?

O. Jones
  • 103,626
  • 17
  • 118
  • 172
André
  • 405
  • 5
  • 16
  • If performance is impacted in this way you might consider creating two tables: one containing the data where b=1 and one for b=0 – theking2 Oct 30 '21 at 12:21
  • One other thing to experiment with is the engine you use or the type of index – theking2 Oct 30 '21 at 12:21
  • Have a look here, if indices dont help: https://stackoverflow.com/questions/1332624/speeding-up-row-counting-in-mysql – Mandraenke Oct 30 '21 at 12:22
  • There is also some difference between InnoDB anD MyISAM where the latter is often faster but less reliable. – Mandraenke Oct 30 '21 at 12:24
  • What is the cardinality of the data for colimn B? What is the % of 0 vs 1 ? – Stu Oct 30 '21 at 12:28
  • 1
    The performance difference makes me think your table is in the MyISAM storage engine. Is it? MyISAM tables keep a persistent count of rows, so querying `SELECT COUNT(*)` but that only makes it fast if your query has no WHERE clause. – Bill Karwin Oct 30 '21 at 12:57
  • It's MyISAM (I edited the question right now). There are much more rows with "1" than with "0". Two tables are no option, because this would need changes in a lot of applications. – André Oct 30 '21 at 13:26
  • @Mandraenke - InnoDB has improved over the years, while MyISAM has been ignored. Most use cases show InnoDB running as fast or faster than MyISAM. – Rick James Oct 31 '21 at 01:04
  • @André - Please explain the use of `B`; there may be some reasonable workarounds involving a trigger, a separate table, a user-expectation change, etc. – Rick James Oct 31 '21 at 01:05
  • @Rick James - It's just a flag. There are a few thousand changes of it per day. I need to get a list of every A, where B is 1 (or 0, depending on the situation / application / question). – André Oct 31 '21 at 06:24
  • Are you hoping to speed up the COUNT? Or the Select of possibly millions of A's? Or both? – Rick James Oct 31 '21 at 06:29
  • I need a list of every "A" or of every "A" where "B" is 1 (or 0). ```select A from TEST``` is quite fast (about 2 seconds including the transfer to the application that uses this list), but adding ```where B=1``` makes it much slower (15 seconds). The first requires a full table scan, the second one a full index scan. This should not have such an impact. – André Oct 31 '21 at 06:33
  • Right now there are 9685 entries with B=0 and 4288552 entries with B=1. – André Oct 31 '21 at 06:35
  • If you actually want to know why `select A from TEST` is (or rather: can be) faster than `select A from TEST where B=1`, please don't ask why `select count(1) from TEST where B=1` is slower than `select count(1) from TEST` (unless you also want to know this). The answers to those two questions are actually different. – Solarflare Oct 31 '21 at 10:55

2 Answers2

0

Sad to say, you have presented a nasty problem. There's no magic that can avoid a full table or full index scan for the query you showed us.

Edit Why is simply counting all your rows so miraculously fast? @BillKarwin explained that in his comment. You use the MyISAM storage engine, which maintains the table's overall row count in its metadata and so can satisfy SELECT COUNT(*) FROM mytbl instantaneously. Once you add a WHERE clause it needs to scan all rows.

Suggestions to improve the situation involve using some sort of summary table. In some other RDBMS SQL this would be called a "materialized view." For example you could create this table:

CREATE TABLE tablecounts AS
SELECT COUNT(*) num, B
  FROM mytbl
 GROUP BY B;
ALTER TABLE tablecounts ADD UNIQUE INDEX B (B);

Then once in a while, maybe from a recurring EVENT, you update it:

INSERT INTO tablecounts (num, B)
SELECT COUNT(*) num, B
  FROM mytbl
 GROUP BY B
    ON DUPLICATE KEY UPDATE SET num=VALUES(num);

Then your app can do SELECT num FROM tablecounts WHERE B=1 to get the count you mentioned in your question.

This isn't a perfect solution. This count will, of course, become stale as other parts of your application INSERT, UPDATE, or DELETE rows in your original table until you run the event. But you'll get an approximate count very quickly.

Edit If you actually need the detailed results of SELECT A FROM tbl WHERE B = 0 an index on (B, A) will help your performance by only scanning the relevant rows. If most of your rows have B = 1 the index will help a lot by giving MySql a way to ignore them completely.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thanks. In the real application, I don't need the count, but a list of every "A", where "B" is 1 (or 0, depending on the situation / question / application). The value of "B" changes for a few thousand rows each day. Maintaining two tables would require changes in many applications. ```select A from TEST``` is quite fast and requires a full table scan too. I clearly understand that adding ```where B=1``` requires a full index scan, but I don't get why this is so much slower. – André Oct 31 '21 at 06:29
-1

B-tree indexes are most effective for high-cardinality data (i.e. columns with many possible values, where the data in the column is unique or almost unique). B-tree indices are bad on such a small cardinality columns. create bitmap index.

Bitmap Indexing is a special type of database indexing that uses bitmaps. This technique is used for huge databases, when column is of low cardinality and these columns are most frequently used in the query.

In your case there are just two values for B column so,low cardinality then use bitmap index use this

CREATE BITMAP INDEX Index_Name ON TEST (B); 
Nikhil S
  • 3,786
  • 4
  • 18
  • 32