0

I'm trying to do a search in MySQL where the user just has one field. The table looks like this:

ID          BIGINT
TITLE       TEXT
DESCRIPTION TEXT
FILENAME    TEXT
TAGS        TEXT
ACTIVE      TINYINT

Now if the user inputs just blah blubber, the search must check wether every word appears in the fields TITLE, DESCRIOTION, FILENAME or TAGS. The result itself should be ordered by relevance, so how often does a string appear in the record. I got this example data:

ID   | TITLE   | DESCRIPTION  | FILENAME | TAGS | ACTIVE
1    | blah    | blah         | bdsai    | bdha | 1
2    | blubber | blah         | blah     | adsb | 1
3    | blah    | dsabsadsab   | dnsa     | dsa  | 1

In this example, ID 2 must be at the top (2x blah, 1x blubber), then 1 (2x blah ) and then 3 (1x blah). This process should be dynamical so the user can also input more words and the relevance works same as with one or several words.

Is this possible to realize only in MySQL, or do I have to use some PHP? How would this work exactly?

Thank you very much for your help! Regards, Florian

EDIT: Here is the result after I tried the answer of Tom Mac:

I have four records which look like this:

ID  | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
1   | s     | s           | s        | s                             | 1
2   | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1
3   | 0     | s           | s        | s                             | 1
4   | a     | a           | a        | a                             | 1

Now, if I search for the string s, I should only get the top three records, ordered by a relevance of s. This means, the records should be orderer like this:

ID | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
2  | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1        <== 8x s
1  | s     | s           | s        | s                             | 1        <== 4x s
3  | 0     | s           | s        | s                             | 1        <== 3x s

Now, I tried my query like this (the table's name is PAGES):

select t . *
  from (

        select 
              match(title) against('*s*' in boolean mode) 
            + match(description) against('*s*' in boolean mode) 
            + match(filename) against('*s*' in boolean mode) 
            + match(tags) against('*s*' in boolean mode) 
            as matchrank,
                bb . *
          from pages bb) t
 where t.matchrank > 0
 order by t.matchrank desc

This query returns this:

matchRank | ID  | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
2         | 2   | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1

Is this because of the wildcards? I think, the string *s* should also find a value which is only s ...

Florian Müller
  • 7,448
  • 25
  • 78
  • 120
  • turns out that the wildcard solution is not a valid workaround for the `ft_min_word_len` setting. Best idea is to change this setting to 1 and restart mysql. Failing that you could write a workaround using `LIKE` but it won't work too well for more than one string & under 4 characters e.g. ('s s') – Tom Mac Oct 28 '11 at 07:47

2 Answers2

2

This might help you out. It does kinda assume that your MySQL table uses the MyISAM engine though:

create table blubberBlah (id int unsigned not null primary key auto_increment,
title varchar(50) not null,
description varchar(50) not null,
filename varchar(50) not null,
tags varchar(50)not null,
active tinyint not null
) engine=MyISAM;

insert into blubberBlah (title,description,filename,tags,active) 
values ('blah','blah','bdsai','bdha',1);
insert into blubberBlah (title,description,filename,tags,active) 
values ('blubber','blah','blah','adsb',1);
insert into blubberBlah (title,description,filename,tags,active) 
values ('blah','dsabsadsab','dnsa','dsa',1);

select t.*
from
(
 select MATCH (title) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (description) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (fileName) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (tags) AGAINST ('blubber blah' IN BOOLEAN MODE) as matchRank,
       bb.*
from blubberBlah bb
) t
order by t.matchRank desc;

EDIT

Another assumption that this solution makes is that the string that your searching for is >= 4 characters long. If there is a possibility that the 'search for' string i.e 'blubber' or 'blah' will be either 1, 2 or 3 characters long then you can always head to your my.cnf file and add ft_min_word_len=1 under the [mysqld] configuration options. Then restart MySQL and you should be good to go.

One final thing: if you are considering using this approach then you should add a FULLTEXT INDEX to each of the columns. Hence:

ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx1`(`title`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx2`(`description`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx3`(`filename`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx4`(`tags`);

You can find more details on BOOLEAN FULLTEXT searching in the MySQL Docs.

Tom Mac
  • 9,693
  • 3
  • 25
  • 35
  • which version of MySQL are you running? Do you get a matchRank value of 0 running the query against your table or the example table from my answer? – Tom Mac Oct 24 '11 at 14:27
  • i'm running 5.0.45 and i replaced the tablename with mine ^^ ("PAGES") – Florian Müller Oct 24 '11 at 15:24
  • can you try running the exact example from my answer i.e. with the exact data that you provided in your question, and let me know if it works or not? If it does work then I suspect that the issue lies with the data in your PAGES table – Tom Mac Oct 24 '11 at 15:34
  • Hell, it worked with your table ..... I've tried it like this: http://ykg-clan.de/tables.png – Florian Müller Oct 24 '11 at 21:40
  • With your table, everything works fine. If I tried with mine (look link on comment before) i get a wrong result. – Florian Müller Oct 24 '11 at 21:49
  • @Florian. Worked out what your problem is. You're searching with a one character string and my example used a 4 character ('blah') and a 7 character ('blubber') string. The default min length is 4 - anything less than this will return a match value of 0. I have amended my answer with a couple of suggestions to address this problem. Good luck! – Tom Mac Oct 25 '11 at 08:45
  • There was something which did not work - I tried it with '\*bla\*', but it does not care which string I input and always shows every record. Why? – Florian Müller Oct 26 '11 at 22:04
  • What do you mean 'did not work'? It will always show every record since there is no `where` clause. If you want to exclude, say, rows that have 0 matchRank then include a `where matchRank > 0` predicate. Furthermore, given the test data you described 'bla' is present in at least one of the columns in every row! Please add more detail to your question if you are going to write 'it did not work'! Your question does not specify that you wish to exclude rows with zero relevance.... – Tom Mac Oct 27 '11 at 06:41
  • Ok, sorry this wasn't a really good comment, I know. Well, I only want to show records with at least one match in it - the where clause does it, I forgot that. But now, if I use an example with only the letter `s` and a wildcard (so it looks like `*s*`) i get only one of three records. If you look at my answer, i've explained how my three test records do look like. – Florian Müller Oct 28 '11 at 07:13
1

Rather than searching 'in boolean mode', use Match() Against() to determine a score. Add those scores up to get relevance.

Robert Smith
  • 385
  • 4
  • 15