Questions tagged [lucene]

The term Lucene refers to the open source Java fulltext search engine library, but also to the entire eco-system that grew around it, including lucene.net, solr, elasticsearch and zend-search-lucene.

The term "Lucene" refers to the open source Java fulltext search engine library, and also to the entire eco-system that grew around it, including , , and . "Lucene" may also be used to refer to top-level projects like Nutch and Tika which were once sub-projects of Lucene.

Use the "Lucene" tag if either:

  • The question is about the Java library
  • The question is about a port of the library, but would make sense to people who know the Java library (many Lucene.NET questions match this criteria).
  • The question is so general it doesn't apply to a specific implementation (example).

References:

Basic Demo:

A basic "getting started" demo showing how to build and query an index is provided as part of the official documentation:

Basic Demo documentation - (this link is for Lucene v8.7.0. Newer versions may be available)

Links to the demo's source files are provided in the above documentation.

The source code can also be found here on GitHub.

Luke - a Lucene GUI Client:

Luke is a GUI client application which can be used to explore your Lucene indexes. Recent versions of Luke are now provided as part of each binary release, which can be downloaded from here.

After downloading the binary release, unzip it, and go to the luke directory. Launch the client using the provided luke.bat or luke.sh scripts.

11993 questions
23
votes
1 answer

ElasticSearch's Fuzzy Query

I am brand new to ElasticSearch, and am currently exploring its features. One of them I am interested in is the Fuzzy Query, which I am testing and having troubles to use. It is probably a dummy question so I guess someone who already used this…
A_dit_rien
  • 287
  • 1
  • 2
  • 7
22
votes
4 answers

What is the purpose of SpanQuery in Lucene?

Can someone explain what a SpanQuery is, and what are typical use cases for it? The documentation is very laconic, and keeps mentioning the concept of "span", which I'm not quite sure I get.
itsadok
  • 28,822
  • 30
  • 126
  • 171
22
votes
1 answer

Solr: What are the benefits of length normalization/omitNorms=false?

We're using Solr to search articles of various lengths. We index both descriptive metadata (title, author, category, keywords, etc) and the full article text. We do not boost relevance at index time - all boosts are done at query time (we use…
Oskar Austegard
  • 4,599
  • 4
  • 36
  • 50
22
votes
3 answers

When to definitely use SOLR over Lucene in a Sitecore 7 build?

My client does not have the budget to setup and maintain a SOLR server to use in their production environment. If I understand the Sitecore 7 Content Search API correctly, it is not a big deal to configure things to use Lucene instead. For the…
Patrick Jones
  • 1,896
  • 14
  • 26
22
votes
2 answers

Exact Meaning of "Slop" in Lucene SpanNearQuery (or slop in ElasticSearch span_near)

Question 1: In Lucene's SpanNearQuery (or span_near in ElasticSearch), what is the exact meaning of slop? Is it the number of words separating the two matching words, or is it the separating number of words plus 1? For example, suppose your indexed…
speedplane
  • 15,673
  • 16
  • 86
  • 138
22
votes
1 answer

What is omitNorms and version field in solr schema?

I am not understanding when to use omitNorms="true". I read 2-3 links but still I am not clear with its meaning. what does it mean "Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting…
Kamal Kishore
  • 325
  • 2
  • 4
  • 15
22
votes
4 answers

Multi-field, multi-word, match without query_string

I would like to be able to match a multi word search against multiple fields where every word searched is contained in any of the fields, any combination. The catch is I would like to avoid using query_string. curl -X POST…
brupm
  • 1,183
  • 1
  • 11
  • 25
21
votes
4 answers

Best way to deal with misspellings in a MySQL fulltext search

I have about 2000 rows in a mysql database. Each row is a max of 300 characters and contains a sentence or two. I use mysql's built in fulltext search to search these rows. I would like to add a feature so that typos and accidental mispellings are…
Travis
  • 213
  • 1
  • 2
  • 5
21
votes
4 answers

package org.apache.commons.io does not exist error

I am compiling a .java file using ant compiler. I am getting the following errror "package org.apache.commons.io does not exist error" I downloaded the apache Commons IO binaries and pasted the .jar files in "C:\Program…
samnaction
  • 1,194
  • 1
  • 17
  • 45
21
votes
1 answer

Apache Lucene vs Google Search Appliance

Has anyone come across with the features of Apache Lucene? I heard its even comparable to Google Search Appliance (GSA). I was looking for a definite comparison between the two, if possible? Those comparisons available online are pretty vague.
Riju Mahna
  • 6,718
  • 12
  • 52
  • 91
21
votes
3 answers

How to search across all the fields?

In Lucene, we can use TermQuery to search a text with a field. I am wondering how to search a keyword across a bunch of fields or all the searchable fields?
Adam Lee
  • 24,710
  • 51
  • 156
  • 236
21
votes
3 answers

Faster search in Lucene - Is there a way to keep the whole index in RAM?

Is there a way of keeping the index in RAM instead of keeping it on the hard disk? We want to make searching faster.
elif
  • 5,427
  • 3
  • 28
  • 28
20
votes
3 answers

Which is the best choice to indexing a Boolean value in lucene?

Indexing a Boolean value(true/false) in lucene(not need to store) I want to get more disk space usage and higher search performance doc.add(new Field("boolean","true",Field.Store.NO,Field.Index.NOT_ANALYZED_NO_NORMS)); //or doc.add(new…
Koerr
  • 15,215
  • 28
  • 78
  • 108
20
votes
1 answer

lucene good practice and thread safety

i'm using lucene to index documents and perform a search after which, i immediately delete them. all this can be considered as a somewhat atomic action that includes the following steps: index (writer) --> search (searcher) --> get docs by score …
levtatarov
  • 1,632
  • 6
  • 24
  • 36
20
votes
3 answers

Solr - LockObtainFailedException on multiple simultaneous writes

My application does very frequent solr writes from multiple clients via REST. I'm using the autocommit feature by using the "commitWithin" attribute. LockObtainFailedException start appearing after couple of days of use. I'm having a hard time…
Nands
  • 1,541
  • 2
  • 20
  • 33