1

I have a MyISAM table in a MySQL database on an Ubuntu 10.04 server with 256mb RAM. Maximum packet size is 32mb. The table has 150,000 rows.

The table has three columns, two of which are TEXT type. I create a FULLTEXT index on both of these; one index has cardinality equal to the number of rows in the table, the other only one third.

Why is that?

Does a low-cardinality index imply MATCH() AGAINST() will miss hits? (It seems so.) If so, can I increase a memory limit some where to increase the cardinality of my index?

mrdenny
  • 27,174
  • 4
  • 41
  • 69
SK9
  • 111
  • 4

2 Answers2

1

If you think that something is wrong, use the utility myisam_ftdump to examine the indexes. The cardinality depends on how many different words there are in the TEXT fields.

You haven't specified the contents of the fields, so the answer is not unique.

Also, if you haven't created a /etc/mystopwords file, MySQL uses its built-in list of stopwords. This list depends on collation, and may give false results if the collation of columns is different from the default.

Even the type of query may raise strange results.

In conclusion, the cardinality is not a problem until you know that something in the indexes is missing.

AndrewQ
  • 390
  • 3
  • 13
  • Thanks. I find it odd that each time I encounter this issue the index cardinality is exactly a fraction of the number of rows - either one third or one half. – SK9 Sep 08 '11 at 13:23
1

can I increase a memory limit some where to increase the cardinality of my index

Eh? No. The cardinality is the number of unique values in the data divided by the total number of values in the table - changing the amount of memory available won't change your data.

As AndrewQ says, using stopwords to be more selective about what gets indexed can have a lot of impact on the cardinality metric - and hence the the optimization of queries, however FULLTEXT indexing is a very blunt instrument for text searching.

Certainly there's not a lot of memory here (I assume this is a virtual machine) which will affect overall performance.

symcbean
  • 21,009
  • 1
  • 31
  • 52