3

I have created a search project that based on lucene 4.5.1

There are about 1 million documents and each of them is about few kb, and I index them with fields: docname(stored), lastmodified,content. The overall size of index folder is about 1.7GB

I used one document (the original one) as a sample, and query the content of that document against index. the problems now is each query result is coming up slow. After some tests, I found that my queries are too large although I removed stopwords, but I have no idea how to reduce query string size. plus, the smaller size the query string is, the less accurate the result comes.

This is not limited to specific file, because I also tested with other original files, the performance of search is relatively slow (often 1-8 seconds)

Also, I have tried to copy entire index directory to RAMDirectory while search, that didn't help.

In addition, I have one index searcher only across multiple threads, but in testing, I only used one thread as benchmark, the expected response time should be a few ms

So, how can improve search performance in this case?

Hint: I'm searching top 1000

MWiesner
  • 8,868
  • 11
  • 36
  • 70
ikel
  • 1,790
  • 6
  • 31
  • 61

1 Answers1

0

If the number of fields is large a nice solution is to not store them then serialize the whole object to a binary field.

The plus is, when projecting the object back out after query, it's a single field rather than many. getField(name) iterates over the entire set so O(n/2) then getting the values and setting fields. Just one field and deserialize.

Second might be worth at something like a MoreLikeThis query. See https://stackoverflow.com/a/7657757/277700

Community
  • 1
  • 1
AndyPook
  • 2,762
  • 20
  • 23