Questions tagged [lucene]

The term Lucene refers to the open source Java fulltext search engine library, but also to the entire eco-system that grew around it, including lucene.net, solr, elasticsearch and zend-search-lucene.

The term "Lucene" refers to the open source Java fulltext search engine library, and also to the entire eco-system that grew around it, including , , and . "Lucene" may also be used to refer to top-level projects like Nutch and Tika which were once sub-projects of Lucene.

Use the "Lucene" tag if either:

  • The question is about the Java library
  • The question is about a port of the library, but would make sense to people who know the Java library (many Lucene.NET questions match this criteria).
  • The question is so general it doesn't apply to a specific implementation (example).

References:

Basic Demo:

A basic "getting started" demo showing how to build and query an index is provided as part of the official documentation:

Basic Demo documentation - (this link is for Lucene v8.7.0. Newer versions may be available)

Links to the demo's source files are provided in the above documentation.

The source code can also be found here on GitHub.

Luke - a Lucene GUI Client:

Luke is a GUI client application which can be used to explore your Lucene indexes. Recent versions of Luke are now provided as part of each binary release, which can be downloaded from here.

After downloading the binary release, unzip it, and go to the luke directory. Launch the client using the provided luke.bat or luke.sh scripts.

11993 questions
45
votes
3 answers

Lucene indexing: Store and indexing modes explained

I think I'm still not understanding the lucene indexing options. The following options are Store.Yes Store.No and Index.Tokenized Index.Un_Tokenized Index.No Index.No_Norms I don't really understand the store option. Why would you ever want to…
Boris Callens
  • 90,659
  • 85
  • 207
  • 305
45
votes
5 answers

Solr/Solrj: How can I determine the total number of documents in an index?

How can I determine the total number of documents in a Solr index using Solrj? After hours of searching on my own, I actually have an answer (given below); I'm only posting this question so others can find the solution more easily.
George Armhold
  • 30,824
  • 50
  • 153
  • 232
45
votes
3 answers

Lucene Score results

In Lucene if you had multiple indexes that covered only one partition each. Why does the same search on different indexes return results with different scores? The results from different servers match exactly. i.e. if I searched for : Name - John…
Stephen Hendry
  • 751
  • 7
  • 10
45
votes
4 answers

TFIDF for Large Dataset

I have a corpus which has around 8 million news articles, I need to get the TFIDF representation of them as a sparse matrix. I have been able to do that using scikit-learn for relatively lower number of samples, but I believe it can't be used for…
apurva.nandan
  • 1,061
  • 1
  • 11
  • 19
45
votes
6 answers

How to evaluate hosted full text search solutions?

What are the options when it comes to SaaS/hosted full text search? How should I evaluate the different options available? I'm looking for something that uses Lucene, solr, or sphinx on the backend, and provides a REST API for submitting documents…
James Cooper
  • 2,320
  • 2
  • 23
  • 23
44
votes
4 answers

Entity Extraction/Recognition with free tools while feeding Lucene Index

I'm currently investigating the options to extract person names, locations, tech words and categories from text (a lot articles from the web) which will then feeded into a Lucene/ElasticSearch index. The additional information is then added as…
Karussell
  • 17,085
  • 16
  • 97
  • 197
42
votes
4 answers

Search engine Lucene vs Database search

I am using a MySQL database and have been using database driven search. Any advantages and disadvantages of database engines and Lucene search engine? I would like to have suggestions about when and where to use them?
S L
  • 14,262
  • 17
  • 77
  • 116
42
votes
1 answer

solr search for documents where a field doesn't exist

How do I search for those document in a SOLR index which do not contain a specified field?
Midhat
  • 17,454
  • 22
  • 87
  • 114
42
votes
2 answers

How can I search on a list of values using Solr/Lucene?

Given the following query: (field:value1 OR field:value2 OR field:value3 OR ... OR field:value50) Can this be broken down into something less verbose? Basically I have hundreds of category IDs, and I need to search for items under large groups of…
Michael Moussa
  • 4,207
  • 5
  • 35
  • 53
42
votes
1 answer

ElasticSearch - Searching For Human Names

I have a large database of names, primarily from Scotland. We're currently producing a prototype to replace an existing piece of software which carries out the search. This is still in production and we're aiming to get our results as closes as…
Nathan Smith
  • 8,271
  • 3
  • 27
  • 44
42
votes
11 answers

What is best and most active open source .Net search technology?

I'm trying to decide on an open source search/indexing technology for a .Net project. It seems like the standard out there for Java projects is Lucene, but as far as .Net is concerned, the Lucene.Net project seems to be pretty inactive. Is this…
jamesaharvey
  • 14,023
  • 15
  • 52
  • 63
40
votes
5 answers

SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

I know there have been questions in the past about SQL 2005 versus Lucene.NET but since 2008 came out and they made a lot of changes to it and was wondering if anyone can give me pros/cons (or link to an article).
ajma
  • 12,106
  • 12
  • 71
  • 90
40
votes
4 answers

Difference between BooleanClause.Occur.Must and BooleanClause.Occur.SHOULD in lucene

Can anyone explain the difference between the BooleanClause.Occur.Must and BooleanClause.Occur.SHOULD in lucene in BooleanQuery with an example?
Jagadesh
  • 6,489
  • 8
  • 29
  • 30
39
votes
4 answers

How to use a Lucene Analyzer to tokenize a String?

Is there a simple way I could use any subclass of Lucene's Analyzer to parse/tokenize a String? Something like: String to_be_parsed = "car window seven"; Analyzer analyzer = new StandardAnalyzer(...); List tokenized_string =…
Felipe Hummel
  • 4,674
  • 5
  • 32
  • 35
35
votes
5 answers

Is Solr available for .Net?

I want to learn Solr. May I know some good tutorial/links for it? Also, is Solr available for .NET?
Ed.
  • 1,654
  • 7
  • 20
  • 33