6

I know there are several questions similar to this but they don't provide a simple answer to the problem at hand. Sorry if you feel this is a duplicate but I think clear and understandable answer would benefit many. So, to the question.

Can Solr indexing updates be automated? And if they can, what would be the optimal way to do it?

Here is a simple use case to clarify the question: I have a database table with several columns of different kind of data. There is a web app which is used to manage the data. I've got separate Solr server to index specified columns in the above mentioned table. How could I achieve an outcome that when users adds, removes or modifies data in the said table, Solr would notice the changed and modify the index.

It would be necessary for it to be "real time". Meaning that after few seconds the changes would take place. Of course with large amount of data it can be more.

Thanks in advance

frustrated
  • 61
  • 1
  • 3

2 Answers2

8

There are two questions here:

Can Solr indexing updates be automated?

Yes they can, and they should be always automated. You don't want to manually launch the indexing process for every change.

It would be necessary for it to be "real time".

I already mentioned some ways to reduce latency between changed data and updating the index in this answer. You could use autoCommit to make sure that your data is committed within x seconds of the update. Depending on the interval, you'd want to reduce autowarming and adjust other settings, see this for more details.

Also keep an eye on the NRT wiki page for related information and solutions about this.

Community
  • 1
  • 1
Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275
  • Thanks for clear answers and links to more resources. I'll investigate the information in depth to see how it could be implemented in my scenario. – frustrated Aug 11 '11 at 05:34
  • I've got one more question about the "real time" part. In your first link you point that ORM and the like features can be used to trigger the indexing. I use Solr API for this so it seems like best option. Basically the data-config.xml defines that my table data from db is one document and each row is an entity. Can I perform updates to the index on an entity level? I recall reading it is possible only on document level. – frustrated Aug 11 '11 at 06:01
  • @frustrated: I don't think you can mix DIH and ORM events. – Mauricio Scheffer Aug 11 '11 at 12:40
3

You may want to take a look at Apache Solr 3.3 with RankingAlgorithm 1.2. It supports NRT (Near Real Time Indexing) and can update 10,000 docs / sec. You can concurrently search during the updates. You do not need to commit or close the searchers. You can get more information about NRT with Solr 3.3 with RankingAlgorithm from here:

http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x

user925543
  • 41
  • 1