0

I want to develope a site for announcing jobs, but because I have a lot of conditions (title,category,tags,city..) I use a MySQL regex statement. However, it's very slow and sometimes results in a 500 internal Server Error

Here is one example :

select * from job 
where 
( LOWER(title) REGEXP 'dév|freelance|free lance| 3eme   grade|inform|design|site|java|vb.net|poo ' 
or 
LOWER(description) REGEXP 'dév|freelance|free lance| 3eme grade|inform|design|site|java|vb.net|poo '
 or
 LOWER(tags) REGEXP 'dév|freelance|free lance| 3eme grade|inform|design|site|java|vb.net|poo') 
and 
LOWER(ville) REGEXP LOWER('Agadir') 
and 
`date`<'2016-01-11' 
order by `date` desc 

Any advice?

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • You should think about a redesign of your database. It is not common to Do so much regexpressions. Maybe you can work with a table for "tags" or something similar. – etalon11 Jan 11 '16 at 19:07
  • i think for that but o have other prob i must search in title and in city name.. – user3194762 Jan 11 '16 at 19:15
  • i have one other prob i import data from other sites i cant write tag for line by line – user3194762 Jan 11 '16 at 19:16
  • What about [Full-Text Search](http://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html)? You could [add an index](http://dev.mysql.com/doc/refman/5.7/en/alter-table.html) and search over several columns. Also see [Boolean Full-Text Search](https://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html). – bobble bubble Jan 11 '16 at 19:24

2 Answers2

1

You can't optimize a query based exclusively on regexes. Use full text indexing (or a dedicated search engine such as Mnogo) for text search and geospatial indexing for locations.

symcbean
  • 47,736
  • 6
  • 59
  • 94
1

The big part of the WHERE, namely the OR of 3 REGEXPs cannot be optimized.

LOWER(ville) REGEXP LOWER('Agadir') can be turned into simply ville REGEXP 'Agadir' if your collation is ..._ci. Please provide SHOW CREATE TABLE job.

Then that can be optimized to ville = 'Agadir'.

But maybe this query is "generated" by your UI? And the users are allowed to use regexp thingies? (SECURITY WARNING: SQL injection is possible here!)

If it is "generated", the generate the "=" version if there are no regexp codes.

Provide these:

INDEX(ville, date) -- for cases when you can do `ville = '...'`
INDEX(date)        -- for cases when you must have `ville REGEXP '...'`

The first will be used (and reasonably optimal) when appropriate. The second is better than nothing. (It depends on how many rows have that date range.)

It smells like there may be other SELECTs. Let's see some other variants. What I have provided here may or may not help with them.

See my indexing cookbook: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Rick James
  • 135,179
  • 13
  • 127
  • 222