0

I have mongodb with collections having million records of free text. I want to enable online query on this text. I was considering between using :

  1. the model data for keyword search http://docs.mongodb.org/manual/tutorial/model-data-for-keyword-search/
  2. mongodb 2.4's new free text search engine
  3. elastic search

First Question: If I use elastic search, in fact I no longer need the mongodb since elastic search keep the all document. Am I right?

Second Question/Problem: Texts in documents may have different languages. It seems that it is a limitation with mongo2.4 where you have to specify the language for the all collection. Am I right? So I should either use the solution 1 (model data) or first I have to separate text according to language. Right?

thanks for comments, suggestions colin

colin
  • 81
  • 2
  • 7
  • Only you can answer your first question as to whether it meets your unspecified requirements. MongoDb text search is not ready for production usage; it's classified as experimental. – WiredPrairie Mar 29 '13 at 10:51

2 Answers2

0

OK, I maybe found a solution to the multilanguage problem: http://docs.mongodb.org/manual/tutorial/create-text-index-on-multi-language-collection/ so i just need to specify the language in the document within a specified field.

mongo rocks!! any comments/remarks?

colin
  • 81
  • 2
  • 7
  • It's really hard to give you a good answer. It's clear that you have to search but not exactly what search requirements you have. If you have basic search requirements mongo might be a fit, but I guess elasticsearch is a lot more flexibile, made exactly for that purpose and more performant. It's known that you can use elasticsearch as NoSQL too, on the other hand the mongo full text search capabilities are not proven yet I guess. – javanna Mar 29 '13 at 10:01
  • Yes mongo's full text search is still not proven. But duplicating information stored by using both mongo and elastic search seems overkill. Since you seem to know elastic search. Simple question: is it possible to get the list of most recurring words in a simple way (so far I am aggregating text then using the collection.Counter class in python) – colin Mar 29 '13 at 11:25
0

What is you app written in?

Because the ElasticSearch C# NEST client driver is not fun or easy to use and ElasticSearch documentation wasn't great when we setup our ElasticSearch cluster.

I have the process for setting up ElasticSearch on EC2 documented if you want it?

We use MongoDB for aggregate querying and as a cache because it is fast and scales excellently and is easy to setup.

The new MongoDB Free Text Search feature is interesting and worth a look, but it totally depends on your use case.

You can read more and see code examples about MongoDB Free Text Search in my blog post

Also, depending where you are hosting, if you are using Amazon Web Services, you could look at CloudSearch

Robs
  • 8,219
  • 6
  • 41
  • 57