I have been saving some product specifications into Solr 5
. Most of the products contain unique variant ids that use dashes or dots, like this: Samesung TV 54 : AD-oi-230
, Sony TV 24 : 1.849.32s.s
.
But occassionally, I come across some variant ids that use spaces instead of dashes, like Samsung 54 : OPD 1 jud
, Sony 32 : s1 90 b33 9 337
.
Since those ids don't have much meaning, if I removed those spaces (Samsung 54 : OPD1jud
, Sony 32 : s190b339337
), would it scale better or make the index size smaller?
Here is my field that stores the model name. I have enabled the WordDelimiterFilterFactory
:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="0" generateNumberParts="1" splitOnCaseChange="0" catenateWords="1" splitOnNumerics="1" stemEnglishPossessive="0" generateWordParts="1" catenateAll="0" catenateNumbers="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="2" max="20"/>
</analyzer>
</fieldType>