20

Indexing a Boolean value(true/false) in lucene(not need to store) I want to get more disk space usage and higher search performance

doc.add(new Field("boolean","true",Field.Store.NO,Field.Index.NOT_ANALYZED_NO_NORMS));
//or
doc.add(new Field("boolean","1",Field.Store.NO,Field.Index.NOT_ANALYZED_NO_NORMS));
//or
doc.add(new NumericField("boolean",Integer.MAX_VALUE,Field.Store.NO,true).setIntValue(1));

Which should I choose? Or any other better way?

thanks a lot

Koerr
  • 15,215
  • 28
  • 78
  • 108

3 Answers3

11

An interesting question!

  • I don't think the third option (NumericField) is a good choice for a boolean field. I can't think of any use case for this.
  • The Lucene search index (leaving to one side stored data, which you aren't using anyway) is stored as an inverted index
  • Leaving your first and second options as (theoretically) identical

If I was faced with this, I think I would choose option one ("true" and "false" terms), if it influences the final decision.

Your choice of NOT_ANALYZED_NO_NORMS looks good, I think.

Adrian Conlon
  • 3,941
  • 1
  • 21
  • 17
  • hi,@adrian-conlon,can u help for this? thanks a lot http://stackoverflow.com/questions/10464377/using-booleanquery-or-write-more-indexes – Koerr May 07 '12 at 11:00
3

Lucene jumps through an elaborate set of hoops to make NumericField searchable by NumericRangeQuery, so definitely avoid it an all cases where your values don't represent quantities. For example, even if you index an integer, but only as a unique ID, you would still want to use a plain String field. Using "true"/"false" is the most natural way to index a boolean, while using "1"/"0" gives just a slight advantage by avoiding the possibility of case mismatch or typo. I'd say this advantage is not worth much and go for true/false.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • Besides NumericRangeQuery, keep in mind another benefit of NumericField: NumericField is ideal for sorting, because building the field cache is much faster than with text-only numbers. Source: http://lucene.apache.org/core/2_9_4/api/core/org/apache/lucene/search/NumericRangeQuery.html – Doug S Oct 20 '13 at 08:44
1

Use Solr (a flavour of lucene) - it indexes all basic java types natively.

I've used it and it rocks.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Interally Solr will still be indexing using lucene though, right? – Adrian Baker Aug 30 '22 at 00:05
  • @AdrianBaker I expect so, but I've never checked. Implementation choices should typically not be that important to the user of a library/product, because you can't change them and if you're using a library/product you trust it and its choices. – Bohemian Aug 30 '22 at 00:40