-2

I am investigating WordDelimiterFilterFactory

I am confusing about generateWordParts and splitOnCaseChange parameters.

from java doc:
generateWordParts:

/**
   * Causes parts of words to be generated:
   * <p>
   * "PowerShot" =&gt; "Power" "Shot"
   */
  public static final int GENERATE_WORD_PARTS = 1;

splitOnCaseChange:

 /**
   * If not set, causes case changes to be ignored (subwords will only be generated
   * given SUBWORD_DELIM tokens)
   */
  public static final int SPLIT_ON_CASE_CHANGE = 64;

Can you show example which clarify the difference?

P.S.

Also I don't understanf menaning of SUBWORD_DELIM

gstackoverflow
  • 36,709
  • 117
  • 359
  • 710
  • @Downvoters, do you really think that java doc is clear? – gstackoverflow Sep 21 '17 at 09:57
  • There is more documentation than just the documentation of the constants in the java file, though. From the documentation: generateWordParts: (integer, default 1) If non-zero, splits words at delimiters. For example:"CamelCase", "hot-spot" -> "Camel", "Case", "hot", "spot" – MatsLindh Sep 22 '17 at 00:12

1 Answers1

0

First: The WordDelimiterFilter has been superseded by the Word Delimiter Graph Filter (which plays nice with phrase queries).

generateWordParts considers more than just case differences. I.e. foo-bar is split into foo and bar. Split on case change wouldn't do anything here, leaving a single foo-bar token.

The SUBWORD_DELIM references the types parameter, where you can include a file that defines which characters that can be assumed to split a token into subwords:

types
(optional) The pathname of a file that contains character => type mappings, which enable customization of this filter’s splitting behavior. Recognized character types: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, and SUBWORD_DELIM.

I assume you could use a SUBWORD_DELIM character as '|' or '.', where you'd split a word into two tokens if it contained either of those characters.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • Can you provide class name for Word Delimiter Graph Filter ? I could not find it – gstackoverflow Sep 21 '17 at 09:39
  • It's [WordDelimiterGraphFilter](https://lucene.apache.org/core/6_6_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterGraphFilter.html) and Solr's factory is named the usual way, `solr.WordDelimiterGraphFilterFactory`. – MatsLindh Sep 21 '17 at 10:52
  • Looks like WordDelimiterGraphFilterFactory is not available in hibernate-search – gstackoverflow Sep 21 '17 at 11:04
  • It behaves mostly the same as the WordDelimiterFilter, but has better / actual support for phrase queries. [It was introduced as part of Solr 6.5](https://issues.apache.org/jira/browse/LUCENE-7619). – MatsLindh Sep 22 '17 at 00:13