Questions tagged [lucene]

The term Lucene refers to the open source Java fulltext search engine library, but also to the entire eco-system that grew around it, including lucene.net, solr, elasticsearch and zend-search-lucene.

The term "Lucene" refers to the open source Java fulltext search engine library, and also to the entire eco-system that grew around it, including , , and . "Lucene" may also be used to refer to top-level projects like Nutch and Tika which were once sub-projects of Lucene.

Use the "Lucene" tag if either:

  • The question is about the Java library
  • The question is about a port of the library, but would make sense to people who know the Java library (many Lucene.NET questions match this criteria).
  • The question is so general it doesn't apply to a specific implementation (example).

References:

Basic Demo:

A basic "getting started" demo showing how to build and query an index is provided as part of the official documentation:

Basic Demo documentation - (this link is for Lucene v8.7.0. Newer versions may be available)

Links to the demo's source files are provided in the above documentation.

The source code can also be found here on GitHub.

Luke - a Lucene GUI Client:

Luke is a GUI client application which can be used to explore your Lucene indexes. Recent versions of Luke are now provided as part of each binary release, which can be downloaded from here.

After downloading the binary release, unzip it, and go to the luke directory. Launch the client using the provided luke.bat or luke.sh scripts.

11993 questions
3
votes
2 answers

SOLR: problems with NGramFilterFactory

I am running SOLR as search engine for an intranet with just over 40000 docs. I keep it very simple by using the copyField directive to copy the title and the keywords fields to the content field and index only that. Since now we were using this…
harpax
  • 5,986
  • 5
  • 35
  • 49
3
votes
1 answer

Lucene Analyzer for Indexing and Searching

I have a field that I am indexing with Lucene like so: @Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES) public HungerState getHungerState() { The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY When these…
schmimd04
  • 1,444
  • 3
  • 14
  • 23
3
votes
1 answer

Solr Sort Performance issues

I am trying to specify sort on a String field in my query but seeing memory issues since the index has around 50M docs. Why is that Solr actually sorts the field values for all the documents in the index and NOT just the hits returned from the…
Satya
  • 31
  • 3
3
votes
1 answer

What Solr tokenizer and filters can I use for a strong general site search?

I'd like to ensure that searching for, say, I.B.M. can be found by searching for ibm. I'd also like to make sure that Dismemberment Plan could be found by searching for dismember. Using Solr, what tokenizer and filters can I use in analysis and…
Carson
  • 17,073
  • 19
  • 66
  • 87
3
votes
1 answer

Different pages to different Nutch cores (within the same domain)

How can I instruct Nutch to treat page#1 as belonging to a core and page#2 as belonging to a different core (both pages from the same domain)? Practical situation: let's say Nutch is crawling and indexing www.businessweek.com; let's also say that I…
3
votes
1 answer

How to find similar documents

How do you find a similar documents of a given document in Lucene. I do not know what the text is i only know what the document is. Is there a way to find similar documents in lucene. I am a newbie so I may need some hand holding.
Luke101
  • 63,072
  • 85
  • 231
  • 359
3
votes
1 answer

how do I make SOLR find mistakes?

I am looking for all document with "symptoms" in them. I want the same or close results also for the following: simptom semptm sympt etc. This is just an example to what I mean, I do not need a solution just for this specific word.. How do I…
Itay Moav -Malimovka
  • 52,579
  • 61
  • 190
  • 278
3
votes
1 answer

Lucene Tokenizer with LookAhead

can anyone point me in the right direction for implementing a Lucene Tokenizer with LookAhead? I'm using a snowball stemmer and I want to be able to get phrases of city names and prevent them from being stemmed, so that "Los Angeles" will be set as…
isapir
  • 21,295
  • 13
  • 115
  • 116
3
votes
3 answers

How to have different schema files to different cores in SOLR?

I have one instance of SOLR with three different cores. I created a solr.xml config file which specifies the schema file for each core, but, it is not recognized. The system still tries to load the default schema.xml (I removed it, so it fails). …
Itay Moav -Malimovka
  • 52,579
  • 61
  • 190
  • 278
3
votes
1 answer

Creating and using LuceneAnalysisDefinitionProvider with Hibernate Search

When you search Stackoverflow or the Internet for LuceneAnalysisDefinitionProvider, you'll find hundreds of pages, each of them having the same code copied from another page without any decent explanation or further examples of usage. So I tried to…
horvoje
  • 643
  • 6
  • 21
3
votes
1 answer

maximum chars in Solr/lucene term for fuzzy match

I am trying to experiment fuzzy match with Solr. In my document indexed first_name field I mentioned as "MYNEWORGANIZATION20SEP2011" - actually the word was "My New Organization 20-Sep-2011" but I removed spaces and other chars. Now above word…
Rushik
  • 1,121
  • 1
  • 11
  • 34
3
votes
1 answer

lucene .net parser for filter and sorting

In our lucene .net based search (Lucene 4.8.0-beta00016) we save the generated query, the filter and the sorting in a custom text file. e.g.: "Query":"+name:*test*" "Filter":"BooleanFilter(+type:project)" "Sort":"!" We built…
SvenG
  • 5,155
  • 2
  • 27
  • 36
3
votes
3 answers

Grouping Lucene search results and calculating frequency by category

I am working on a store search API using Lucene. I need to show store search results for each City,State combination with its frequency in brackets....for example: Los Angles,CA (450) Atlanta,GA (212) Boston, MA (78) . . . As of now, my search…
Steve Chapman
  • 1,317
  • 4
  • 23
  • 34
3
votes
1 answer

Lucene proximity search with boundaries?

Is there a way to perform a proximity search that is bounded, not by a fixed number of tokens, but by 2 marker tokens of some kind? For example, to implement proximity queries that are bounded inside as single sentence or paragraph? Obviously the…
Lilith River
  • 16,204
  • 2
  • 44
  • 76
3
votes
5 answers

Find all available values for a field in lucene .net

If I have a field x, that can contain a value of y, or z etc, is there a way I can query so that I can return only the values that have been indexed? Example x available settable values = test1, test2, test3, test4 Item 1 : Field x = test1 Item 2 :…
mickyjtwin
  • 4,960
  • 13
  • 58
  • 77
1 2 3
99
100