Questions tagged [lucene]

The term Lucene refers to the open source Java fulltext search engine library, but also to the entire eco-system that grew around it, including lucene.net, solr, elasticsearch and zend-search-lucene.

The term "Lucene" refers to the open source Java fulltext search engine library, and also to the entire eco-system that grew around it, including , , and . "Lucene" may also be used to refer to top-level projects like Nutch and Tika which were once sub-projects of Lucene.

Use the "Lucene" tag if either:

  • The question is about the Java library
  • The question is about a port of the library, but would make sense to people who know the Java library (many Lucene.NET questions match this criteria).
  • The question is so general it doesn't apply to a specific implementation (example).

References:

Basic Demo:

A basic "getting started" demo showing how to build and query an index is provided as part of the official documentation:

Basic Demo documentation - (this link is for Lucene v8.7.0. Newer versions may be available)

Links to the demo's source files are provided in the above documentation.

The source code can also be found here on GitHub.

Luke - a Lucene GUI Client:

Luke is a GUI client application which can be used to explore your Lucene indexes. Recent versions of Luke are now provided as part of each binary release, which can be downloaded from here.

After downloading the binary release, unzip it, and go to the luke directory. Launch the client using the provided luke.bat or luke.sh scripts.

11993 questions
16
votes
5 answers

How to make Lucene match all words in query?

I am using Lucene to allow a user to search for words in a large number of documents. Lucene seems to default to returning all documents containing any of the words entered. Is it possible to change this behaviour? I know that '+' can be use to…
paul
  • 13,312
  • 23
  • 81
  • 144
16
votes
3 answers

how do I normalise a solr/lucene score?

I am trying to work out how to improve the scoring of solr search results. My application needs to take the score from the solr results and display a number of “stars” depending on how good the result(s) are to the query. 5 Stars = almost/exact…
Grant Collins
  • 1,781
  • 5
  • 31
  • 47
16
votes
3 answers

no segments* file found

I need to access a lucene index ( created by crawling several webpages using Nutch) but it is giving the error shown above : java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/home/: files: at…
crazyaboutliv
  • 3,029
  • 9
  • 33
  • 50
16
votes
3 answers

Find all Lucene documents having a certain field

I want to find all documents in the index that have a certain field, regardless of the field's value. If at all possible using the query language, not the API. Is there a way?
Michael Böckling
  • 7,341
  • 6
  • 55
  • 76
16
votes
4 answers

How to query lucene with "like" operator?

The wildcard * can only be used at the end of a word, like user*. I want to query with a like %user%, how to do that?
Freewind
  • 193,756
  • 157
  • 432
  • 708
16
votes
7 answers

Recommendations for a spidering tool to use with Lucene or Solr?

What is a good crawler (spider) to use against HTML and XML documents (local or web-based) and that works well in the Lucene / Solr solution space? Could be Java-based but does not have to be.
BuddyJoe
  • 69,735
  • 114
  • 291
  • 466
16
votes
1 answer

Difference(s) between Solr's Cursor and ElasticSearch's Scroll

While looking for pagination with Solr and ElasticSearch, it turned out, both have the same "problem" (deep pagination, especially with shards). Though both search engines provide a solution/workaround for that: Solr: cursor…
Benjamin M
  • 23,599
  • 32
  • 121
  • 201
16
votes
4 answers

Logging Search Keywords in Solr / Lucene

I'm new to Solr and am looking for a way to record searches (or keywords) to a log file or database so that I can then analyse for data visualisation. Can Solr do this already? Is this data accessible via. a Solr query? Thanks. Update 1 I'm…
Ryall
  • 12,010
  • 11
  • 53
  • 77
16
votes
1 answer

Very slow Solr performance when highlighting

I have a Solr 4.4.0 core configured that contains about 630k documents with an original size of about 10 GB. Each of the fields gets copied to the text field for purposes of queries and highlighting. When I execute a search without highlight, the…
Jason
  • 2,806
  • 2
  • 28
  • 38
16
votes
3 answers

Does Lucene.Net manage multiple threads accessing the same index, one indexing while the other is searching?

When using Lucene.Net with ASP.NET, I can imagine that one web request can trigger an update to the index while another web request is performing a search. Does Lucene.Net have built in it the ability to manage concurrent access, or do I have to…
Corey Trager
  • 22,649
  • 18
  • 83
  • 121
16
votes
1 answer

Relevance feedback in Apache Solr

I would like to implement relevance feedback in Solr. Solr already has a More Like This feature: Given a single document, return a set of similar documents ranked by similarity to the single input document. Is it possible to configure Solr's More…
snakile
  • 52,936
  • 62
  • 169
  • 241
16
votes
2 answers

Solr DIH -- How to handle deleted documents?

I'm playing around with a Solr-powered search for my webapp, and I figured it'd be best to use the DataImportHandler to handle syncing with the app via the database. I like the elegance of just checking the last_updated_date field. Good stuff. …
Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
16
votes
2 answers

Enabling soundex/metaphone for non-English characters

I've been studying soundex, metaphone and other string search techniques the past few days, and in my understanding both algorithms work well in handling non-English words transliterated to English. However the requirement that I have would be for…
Jon Limjap
  • 94,284
  • 15
  • 101
  • 152
16
votes
7 answers

Full-text search for static HTML files on CD-Rom via javascript

I will be delivering a set of static HTML pages on CD-Rom; these pages need to be fully viewable with no Internet access whatsoever. I'd like to provide a full-text search (Lucene-like) for the content of those pages, which should "just work" from…
Bambax
  • 2,920
  • 6
  • 34
  • 43
16
votes
3 answers

Synonyms using Lucene

What is the best way to handle synonyms (phrases) using Lucene? Especially, when I need to execute queries like :a OR b OR c NOT d How about adding a new field called "synonyms" to each document while indexing? This field's value would have a list…
Ed.
  • 1,654
  • 7
  • 20
  • 33