45

I think I'm still not understanding the lucene indexing options.

The following options are

  • Store.Yes
  • Store.No

and

  • Index.Tokenized
  • Index.Un_Tokenized
  • Index.No
  • Index.No_Norms

I don't really understand the store option. Why would you ever want to NOT store your field?
Tokenizing is splitting up the content and removing the noise words/separators (like "and", "or" etc)
I don't have a clue what norms could be. How are tokenized values stored?
What happens if i store a value "my string" in "fieldName"? Why doesn't a query

fieldName:my string

return anything?

柯鴻儀
  • 613
  • 1
  • 10
  • 25
Boris Callens
  • 90,659
  • 85
  • 207
  • 305

3 Answers3

86

Store.Yes

Means that the value of the field will be stored in the index

Store.No

Means that the value of the field will NOT be stored in the index

Store.Yes/No does not affect the indexing or searching with lucene. It just tells lucene if you want it to act as a datastore for the values in the field. If you use Store.Yes, then when you search, the value of that field will be included in your search result Documents.

If you're storing your data in a database and only using the Lucene index for searching, then you can get away with Store.No on all of your fields. However, if you're using the index as storage as well, then you'll want Store.Yes.

Index.Tokenized

Means that the field will be tokenized when it's indexed (you got that one). This is useful for long fields with multiple words.

Index.Un_Tokenized

Means that the field will not be analyzed and will be stored as a single value. This is useful for keyword/single-word and some short multi-word fields.

Index.No

Exactly what it says. The field will not be indexed and therefore unsearchable. However, you can use Index.No along with Store.Yes to store a value that you don't want to be searchable.

Index.No_Norms

Same as Index.Un_Tokenized except for that a few bytes will be saved by not storing some Normalization data. This data is what is used for boosting and field-length normalization.

For further reading, the lucene javadocs are priceless (current API version 4.4.0):

For your last question, about why your query's not returning anything, without knowing anymore about how you're indexing that field, I'd say that it's because your fieldName qualifier is only attached to the 'my' string. To do the search for the phrase "my string" you want:

fieldName:"my string"

A search for both the words "my" and "string" in the fieldName field:

fieldName:(my string)

Alberto
  • 5,021
  • 4
  • 46
  • 69
dustyburwell
  • 5,755
  • 2
  • 27
  • 34
  • Thanks, that clears up a thing or two. Still not sure what I'm doing wrong with my indexing/searching though. But now I got a better view at what I'm doing. – Boris Callens Mar 17 '09 at 07:50
  • Are you using 2.4.1? Because those Field.Index values have been deprecated in favor of new names which are a bit clearer, IMO. See http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/document/Field.Index.html. – Jegschemesch Apr 14 '09 at 05:14
  • Well, I (and the OP if I'm not mistaken) have been using Lucene.Net which is quite a bit behind. I don't recall which version the port is equivalent to at this point, but those are the values that it has available. – dustyburwell Apr 14 '09 at 17:43
  • As far as I know the Lucene.net version numbers match the Lucene version they're ported from – Nick Jul 08 '09 at 13:42
  • With lucene 2.9.1, INDEX.TOKENIZED is deprecated. The documentation says it is just renamed to ANALYZER, but I don't think the meaning has stayed the same. Anyone know any more details about INDEX.ANALYZER? – Flynn81 Feb 10 '10 at 19:25
  • The Field.Index.ANALYZED does tokenization, which is the reason Field.Index.TOKENIZED now refers to it. – Steen Feb 10 '10 at 20:04
  • I am using Lucene 7.3.0 and very confusing about this migration. Can you tell me how is it right now? – iamatsundere181 Jul 08 '19 at 10:39
2

In case any Java users stumble upon this, the same options in the March 2009 answer still exist in the Lucene 4.6.0 Java library but are deprecated. The current way to set these options is via FieldType.

Ian Durkan
  • 1,212
  • 1
  • 12
  • 26
0

Store.YES will give you ability to highlight the word (via highlight function )that match with the your search keyword. It means not just retrieved, but also displayed

ridhopratama
  • 47
  • 1
  • 10