2

the task is to implement text search in MySQL in my project(PHP/Zend Framework 2 + MySQL). The issue is that text fields are not big at all, it is mostly VARCHAR fields or joined fields like city names, company names and so on, about 5-10 fields for each entity. So currently I decided to choose Lucene(zend framework 2 module - Zend Search), but will it be effective to use technologies like Lucene or Sphinx for small varchar fields?

Thank you.

Cassius
  • 153
  • 1
  • 14

1 Answers1

5

Sure, Lucene or Sphinx can work with any varchar columns that contain text.* They don't have to be huge.

Any fulltext indexing solution is hundreds or thousands of times better than using LIKE '%word%'!

You might be interested in my presentation, Fulltext Search Throwdown.

You can also watch a recording of me delivering that presentation as a webinar: http://www.percona.com/webinars/2012-08-22-full-text-search-throwdown (it's free but requires registration).

* Lucene and Sphinx can do some things with numeric columns too.

PS: I was the project lead on Zend Framework 1.0. Zend_Search_Lucene was an interesting experiment at circa 2007, but it's way outdated, relative to Apache Lucene/Solr, and Zend_Search_Lucene is orders of magnitude slower than the Java implementation. I wouldn't bother with it.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    Hi, Bill. Thank you for your comment and valuable presentation. I'm also looking forward to use/switch to percona server. – Cassius Jul 19 '13 at 23:07
  • @Bill Karwin has a good suggestion. I engineered the search engine that PRWeb had and we were dealing with only about 50 million documents but it was quite an improvement over using a RDBMS for searching. The *necessary* distinction is: search threads are still subject to row, page and table locks. So all searches could (at best) hiccough when you add/update/index a series of rows. This kind of contention is not necessary for your database but ads a lot of reader contention to your searched tables. – memnoch_proxy Jul 19 '13 at 23:27
  • Percona is a good choice, but I'd suspect their consultants would suggest you not combine an RDBMs with full text search unless you had a very small user base. – memnoch_proxy Jul 19 '13 at 23:29
  • 1
    @memnoch_proxy, I work for Percona, as a consultant and trainer. We do recommend fulltext search, if the alternative is using `LIKE` with wildcards. – Bill Karwin Jul 19 '13 at 23:44
  • Yes, Java implementation is faster than Zend_Search_Lucene, the question is are we trying to kill a butterfly with a cannon, regarding the scale of application and small few words containing fields.. – Cassius Jul 20 '13 at 21:02
  • After your table grows to a few tens of thousands of rows, you won't be happy with anything short of a fulltext search solution. It will just be too slow. But you could also use MyISAM tables for the searchable text, that would be much simpler than learning another tool like Solr or Sphinx. I would store the real data in InnoDB tables, and just copy the searchable text to a MyISAM table. I don't trust MyISAM to not corrupt my data. – Bill Karwin Jul 20 '13 at 23:21