1

I am trying to add lucene.net on my project where searching getting more complicated data. but transaction (or table modifying frequently like inserting new data or modifying the field which is used in lucene index).

Is it good to use lucene.net searching here?

How can I find modified fields & update to specific lucene index which is already created? Lucene index contains documents that are deleted from the table then how can I remove them from lucene index?

while loading right now,

  1. I have removed index which are not available in the table based on unique Field
  2. inserting if index does not exist otherwise updating all index which are matching table unique field

While loading page it's taking more time than normal, due to my removing/inserting/updating index method calling.

How can I proceed with it?

James Z
  • 12,209
  • 10
  • 24
  • 44
Sharanamma Jekeen
  • 109
  • 1
  • 3
  • 18

2 Answers2

3

Lucene is absolutely suited for this type of feature. It is completely thread-safe... IF you use it the right way.

Solution pointers

Create a single IndexWriter and keep it in a globally accessible singleton (either a global static variable or via dependency injection). IWs are completely threadsafe. NEVER open multiple IWs on the same folder.

Perform all updates/deletes via this singleton. (I had one project doing 100's of ops/second with no issues, even on slightly crappy hardware).

Depending on the frequency of change and the latency acceptable to the app, you could:

  • Send an update/delete to the index every time you update the DB
  • Keep a "transaction log" or queue (probably in the same DB) of changed rows and deletions (which are are to track otherwise). Then update the index by consuming the log/queue.

To search, create your IndexSearcher with searcher = new IndexSearcher(writer.GetReader()). This is part of the NRT (near real time) pattern. NEVER create a separate IndexReader on an index folder that is also open by an IW.

Depending on your pattern of usage you may wish to introduce a period of "latency" between changes happening and those changes being "visible" to the searches...

Instances of IS are also threadsafe. So you can also keep an instance of an IS through which all your searches go. Then recreate it periodically (eg with a timer) then swap it using Interlocked.Exchange.

I previously created a small framework to isolate this from the app and make it reusable.

Caveat

Having said that... Hosting this inside IIS does raise some problems. IIS will occasionally restart your app. Is will also (by default) start the new instance before stopping the existing one, then swaps them (so you don't see the startup time of the new one).

So, for a short time there will be two instances of the writer (which is bad!)

You can tell IIS to disable "overlapping" or increase the time between restarts. But this will cause other side-effects.

So, you are actually better creating a separate service to host your lucene bits. A simple self hosted WebAPI Windows service is ideal and pretty simple. This also gives you better control over where the index folder goes and the ability to host it on a different machine (which isolates the disk IO load). And means that the service can be accessed from other parts of your system, tested separately etc etc

Why is this "better" than one of the other services suggested?

It's a matter of choice. I am a huge fan of ElasticSearch. It solves a lot of problems around scale and resilience. It also uses the latest version of Java Lucene which is far, far ahead of lucene.net in terms of capability and performance. (The same goes for the other two).

BUT, ES and Solr are Java (which may or may not be an issue for you). AzureSearch is hosted in Azure which again may or may not be an issue.

All three will require climbing a learning curve and will require infrastructure support or external third party SaaS commitment.

If you keep the service inhouse and in c# it keeps it simple and you have control over the capabilities and the shape of the API can be turned for your needs.

No "right" answer. You'll have to make choices based on your situation.

AndyPook
  • 2,762
  • 20
  • 23
1

You should be indexing preferrably according to some schedule (periodically). The easiest approach is to keep the date of last index and then query for all the changes since then and index new, update and remove records. In order to keep track of removed entries in the database you will need to have a log of deleted records with a date it was removed. You can then query using that date to what needs to be removed from the lucene.

Now simply run that job every 2 minutes or so.

That said, Lucene.net is not really suited for web application, you should consider using ElasticSearch, SOLR or AzureSearch. Basically server that can handle load and multi threading better.

Woland
  • 2,881
  • 2
  • 20
  • 33
  • 1
    The first part of your response is great. It sounds like the asker is re-indexing every time the page loads which does completely invalidate any benefit you get from using Lucene. However, I don't understand your point stating that Lucene.Net is not suited for a web application? Using ElasticSearch, SOLR or AzureSearch would potentially complicate the application if it is a .net app. Using a background process on the server to manage the index can be done with any of these technologies, including Lucene.net – JLo Jan 28 '16 at 12:26
  • Lucene is file based search, it is not suited for multi threaded operations and can cause issues in the live environment with locked or blocked files. – Woland Jan 28 '16 at 19:09
  • Also even though elastic search would complicate application, it is correct thing to do and will make life easier in the future. I don't know much about application demands like performance and availability, but even basic availability will be hard to archieve as lucene will next to impossible to deploy on 2 load balanced servers. – Woland Jan 28 '16 at 22:40
  • 1
    According to this answer the above isn't true? http://stackoverflow.com/questions/193624/does-lucene-net-manage-multiple-threads-accessing-the-same-index-one-indexing-w In our experience multiple IIS W3WP threads are happily able to read from a Lucene index. I think your comments are misleading in that point. – JLo Jan 29 '16 at 00:37
  • @JLo We used Lucene.net in production environments and if you have multiple indexers updating and optimizing indexes at the same time the problem can arise (they might be not hard crashes, but performance related). The article you pointed out talks about Java based implementation which is an original, not .net, there is a difference. In anyway, Lucene.net is not designed to be used in web environment, especially not in web farm where availability is important. – Woland Jan 29 '16 at 05:55