Questions tagged [lucene]

The term Lucene refers to the open source Java fulltext search engine library, but also to the entire eco-system that grew around it, including lucene.net, solr, elasticsearch and zend-search-lucene.

The term "Lucene" refers to the open source Java fulltext search engine library, and also to the entire eco-system that grew around it, including , , and . "Lucene" may also be used to refer to top-level projects like Nutch and Tika which were once sub-projects of Lucene.

Use the "Lucene" tag if either:

  • The question is about the Java library
  • The question is about a port of the library, but would make sense to people who know the Java library (many Lucene.NET questions match this criteria).
  • The question is so general it doesn't apply to a specific implementation (example).

References:

Basic Demo:

A basic "getting started" demo showing how to build and query an index is provided as part of the official documentation:

Basic Demo documentation - (this link is for Lucene v8.7.0. Newer versions may be available)

Links to the demo's source files are provided in the above documentation.

The source code can also be found here on GitHub.

Luke - a Lucene GUI Client:

Luke is a GUI client application which can be used to explore your Lucene indexes. Recent versions of Luke are now provided as part of each binary release, which can be downloaded from here.

After downloading the binary release, unzip it, and go to the luke directory. Launch the client using the provided luke.bat or luke.sh scripts.

11993 questions
17
votes
1 answer

Document Similarity in ElasticSearch

I want to calculate similarity between two documents indexed in elasticsearch. I know it can be done in lucene using term vectors. What is the direct way to do it? I found that there is a similarity module doing exactly…
Pratik Poddar
  • 1,353
  • 3
  • 18
  • 36
17
votes
1 answer

Lucene.Net Search result to highlight search keywords

I use Lucene.Net to index some documents. I want to show the user a couple of lines as to why that document is in the result set. just like when you use google to search and it shows the link and followed by the link there are a few lines with the…
Ali Shafai
  • 5,141
  • 8
  • 34
  • 50
17
votes
3 answers

How to search an int field in Lucene 4?

I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like: Document doc = new Document(); doc.add(new StringField("ticket_number",…
Konrad Garus
  • 53,145
  • 43
  • 157
  • 230
17
votes
2 answers

Exact search in array object type using elasticsearch

I'm looking for a way to do exact array matches in elastic search. Let's say these are my documents: {"id": 1, "categories" : ["c", "d"]} {"id": 2, "categories" : ["b", "c", "d"]} {"id": 3, "categories" : ["c", "d", "e"]} {"id": 4, "categories" :…
Pascal
  • 5,879
  • 2
  • 22
  • 34
17
votes
4 answers

org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory

I am new to Java and Lucene. My code gets a line from a file and stores it in Lucene Index. But when I create an IndexReader to search and read from the index it throws an exception. My java code is below. On creating the IndexReader it throws an…
Ahmed Khakwani
  • 410
  • 2
  • 8
  • 18
16
votes
6 answers

How do I see/debug the way SOLR find it's results?

Let's say I search for "ABLS" and the SOLR returns a result that to me does not make any sense. How can I debug why SOLR picked this record to be returned?
Itay Moav -Malimovka
  • 52,579
  • 61
  • 190
  • 278
16
votes
2 answers

Improve multi-thread indexing with lucene

I am trying to build my indexes in Lucene with multiple threads. So, I started my coding and wrote the following code. First I find the files and for each file, I create a thread to index it. After that I join the threads and optimize the indexes.…
orezvani
  • 3,595
  • 8
  • 43
  • 57
16
votes
2 answers

Scoring of solr multivalued field

If I have a document with a multivalued field in Solr are the multiple values scored independently or just concatenated and scored as one big field? I'm hoping they're scored independently. Here's an example of what I mean: I have a document with…
user605331
  • 3,718
  • 4
  • 33
  • 60
16
votes
1 answer

Extract tf-idf vectors with lucene

I have indexed a set of documents using lucene. I also have stored DocumentTermVector for each document content. I wrote a program and got the term frequency vector for each document, but how can I get tf-idf vector of each document? Here is my code…
orezvani
  • 3,595
  • 8
  • 43
  • 57
16
votes
3 answers

How datas are stored in lucene

I know that lucene creates an index and stores all the data .Can any one tell me how the data is stored in flat file? or what kind of algorithms they use to store the data in backend so that they can retrieve it quickly?
Ramesh
  • 2,295
  • 5
  • 35
  • 64
16
votes
1 answer

My java process's file descriptors going "bad" and I have no idea why

I have a java webapp, built with Lucene, and I keep getting various "file already closed" exceptions - depending on which Directory implementation I use. I've been able to get "java.io.IOException Bad File Descriptor" and…
oorza
  • 161
  • 5
16
votes
4 answers

Lucene index backup

What is the best practice to backup a lucene index without taking the index offline (hot backup)?
yannisf
  • 6,016
  • 9
  • 39
  • 61
16
votes
1 answer

Document search on partial words

I am looking for a document search engine (like Xapian, Whoosh, Lucene, Solr, Sphinx or others) which is capable of searching partial terms. For example when searching for the term "brit" the search engine should return documents containing either…
GeneralBecos
  • 2,476
  • 2
  • 22
  • 32
16
votes
5 answers

Searching names with Apache Solr

I've just ventured into the seemingly simple but extremely complex world of searching. For an application, I am required to build a search mechanism for searching users by their names. After reading numerous posts and articles including: How can I…
shachibista
  • 287
  • 3
  • 11
16
votes
4 answers

How do I sort Lucene results by field value using a HitCollector?

I'm using the following code to execute a query in Lucene.Net var collector = new GroupingHitCollector(searcher.GetIndexReader()); searcher.Search(myQuery, collector); resultsCount = collector.Hits.Count; How do I sort these search results based on…
Ed.
  • 1,654
  • 7
  • 20
  • 33