1

Currently, we've got an application that needs to perform very fast searches on ~2 mill records.

Searches need to search both a large free-text field, and a number of integer/decimal fields between different ranges, along with various functions/computations & sorting.

Currently, we're handling this with a big MSSQL database, using the built-in freetext engine, and some replication to move the load off the transactional tables.

However - as you may have guessed, this solution isn't the most scalable.

I've written up a little Lucene-based document store, and am generally quite impressed by the results, with text searches not taking much longer than a 1/2 a second (on 100k records).

The hard part is the parametric searching - I'm aware Lucene does basic range matching - however I feel we need something more powerful.

I've made a little test database using db4o - which has powerful query capabilities, however these queries are quite slow - taking over 15sec on only 100k records - wherein SQL takes about 1.5 seconds for the freetext & parametric searches.

Also, our database needs to have an update resolution of less than 10min, with approx 15% of the records changing on a daily basis. Our SQL server is handling this currently, but starting to creak.

Any guidance on suitable technologies & approaches would be appreciated.

Cheers, Dave

Dave Bish
  • 19,263
  • 7
  • 46
  • 63
  • Could you clarify what you mean by "need something more powerful." about Lucene parametric searching? It is quite powerful and should accommodate most query requirements. – Mikos Jul 21 '10 at 11:11
  • 1
    db4o taking 15 secs on 100k objects doesn't sound like our database. Maybe you would like to post your code to the db4o forums? http://developer.db4o.com/Forums.aspx I am sure something can be done for speed. – Carl Rosenberger Jul 21 '10 at 14:49
  • Why not use SQL Server's own Full Text Search? – Panagiotis Kanavos Jul 22 '10 at 16:17

1 Answers1

0

LinkedIn wrote an add-on to Lucene called bobo to expand its facted search queries which might be worth looking into. But I think bobo is really only needed if you have an absolutely massive index - there must be something reallyl weird going on if a search on 100k documents is taking that long.

Xodarap
  • 11,581
  • 11
  • 56
  • 94