Questions tagged [lucene]

The term Lucene refers to the open source Java fulltext search engine library, but also to the entire eco-system that grew around it, including lucene.net, solr, elasticsearch and zend-search-lucene.

The term "Lucene" refers to the open source Java fulltext search engine library, and also to the entire eco-system that grew around it, including , , and . "Lucene" may also be used to refer to top-level projects like Nutch and Tika which were once sub-projects of Lucene.

Use the "Lucene" tag if either:

  • The question is about the Java library
  • The question is about a port of the library, but would make sense to people who know the Java library (many Lucene.NET questions match this criteria).
  • The question is so general it doesn't apply to a specific implementation (example).

References:

Basic Demo:

A basic "getting started" demo showing how to build and query an index is provided as part of the official documentation:

Basic Demo documentation - (this link is for Lucene v8.7.0. Newer versions may be available)

Links to the demo's source files are provided in the above documentation.

The source code can also be found here on GitHub.

Luke - a Lucene GUI Client:

Luke is a GUI client application which can be used to explore your Lucene indexes. Recent versions of Luke are now provided as part of each binary release, which can be downloaded from here.

After downloading the binary release, unzip it, and go to the luke directory. Launch the client using the provided luke.bat or luke.sh scripts.

11993 questions
18
votes
4 answers

SOLR and Natural Language Parsing - Can I use it?

Requirements Word frequency algorithm for natural language processing Using Solr While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP. I thought of SOLR…
andy
  • 8,775
  • 13
  • 77
  • 122
18
votes
1 answer

lucene Fields vs. DocValues

I'm using and playing with Lucene to index our data and I've come across some strange behaviors concerning DocValues Fields. So, Could anyone please just explain the difference between a regular Document field (like StringField, TextField, IntField…
Yossi Vainshtein
  • 3,845
  • 4
  • 23
  • 39
18
votes
1 answer

EdgeNGram: Error instantiating class: 'org.apache.lucene.analysis.ngram.EdgeNGramFilterFactory'

I've set up Solr, so far everything's working just dandy, but now I wanted to add the EdgeNGram functionality to my searches. However, as soon as I throw it into my schema.xml, it starts throwing the…
2fat2kidnap
  • 451
  • 4
  • 9
18
votes
5 answers

Situations to prefer Apache Lucene over Solr?

There are several advantages to use Solr 1.4 (out-of-the-box facetting search, grouping, replication, http administration vs. luke, ...). Even if I embed a search-functionality in my Java application I could use SolrJ to avoid the HTTP trade-off…
Karussell
  • 17,085
  • 16
  • 97
  • 197
18
votes
6 answers

Update specific field on SOLR index

I want to using solr for search on articles I have 3 table: Group (id , group name) ArticleBase (id, groupId, some other field) Article(id, articleBaseId, title, date, ...) in solr schema.xml file i just define all article field that mixed with…
Hamid
  • 1,099
  • 3
  • 22
  • 37
18
votes
5 answers

Using Lucene to count results in categories

I am trying to use Lucene Java 2.3.2 to implement search on a catalog of products. Apart from the regular fields for a product, there is field called 'Category'. A product can fall in multiple categories. Currently, I use FilteredQuery to search for…
Sachin
18
votes
6 answers

Keyword (OR, AND) search in Lucene

I am using Lucene in my portal (J2EE based) for indexing and search services. The problem is about the keywords of Lucene. When you use one of them in the search query, you'll get an error. For example: searchTerms = "ik OR jij" This works fine,…
Areca
  • 1,292
  • 4
  • 11
  • 21
18
votes
4 answers

Lucene Index problems with "-" character

I'm having trouble with a Lucene Index, which has indexed words, that contain "-" Characters. It works for some words that contain "-" but not for all and I don't find the reason, why it's not working. The field I'm searching in, is analyzed and…
Zteve
  • 341
  • 1
  • 2
  • 12
17
votes
3 answers

How to clear the cache in Solr?

I'm trying to compare the performance of different Solr queries. In order to get a fair test, I want to clear the cache between queries. How is this done? Of course, one can restart the server, I was curious if there is a quicker way.
Eric Wilson
  • 57,719
  • 77
  • 200
  • 270
17
votes
4 answers

Change dynamically elasticsearch synonyms

Is it possible to store the synonyms for elasticsearch in the index? Or is it possible to get the synonym list from a database like couchdb? I'd like to add synonyms dynamically to elasticsearch via the REST-API.
Medrod
  • 986
  • 7
  • 17
17
votes
4 answers

Is {Filter}ing faster than {Query}ing in Lucene?

While reading "Lucene in Action 2nd edition" I came across the description of Filter classes which are could be used for result filtering in Lucene. Lucene has a lot of filters repeating Query classes. For example, NumericRangeQuery and…
Denis Bazhenov
  • 9,680
  • 8
  • 43
  • 65
17
votes
1 answer

Lucene Returning Documents with non positive score

We have recently upgraded a CMS we work on and had to move from Lucene.net V2.3.1.301 to V2.9.4.1 We used a CustomScoreQuery in our original solution which did various filtering that couldn't be achieved with the built in queries. (GEO, Multi Date…
Ettienne
  • 365
  • 3
  • 9
17
votes
2 answers

"Nothing to start" when trying to start Apache Solr

I have Ubuntu 14.10 and now I want to install and try Apache Solr. First of all, I visited official Apache Solr page and downloaded a zip archive. Then I unzipped it in one folder called solr, so that this manually created folder now contains these…
Jacobian
  • 10,122
  • 29
  • 128
  • 221
17
votes
1 answer

Accessing Lucene query in Elastic Search's native script scorer

I'd like to write a custom Elastic Search scorer that takes all terms from the document in index, all terms from the query and based on some custom logic calculates the score. After some research, it seems that the most straight-forward way to…
Lukáš Lalinský
  • 40,587
  • 6
  • 104
  • 126
17
votes
2 answers

Solr facet sum instead of count

I'm new to Solr and I'm interested in implementing a special facet. Sample documents: { hostname: google.com, time_spent: 100 } { hostname: facebook.com, time_spent: 10 } { hostname: google.com, time_spent: 30 } { hostname: reddit.com, time_spent:…
advait
  • 6,355
  • 4
  • 29
  • 39