2

Using Solr 5.4, I am trying to index and search postal codes phonetically. I have tried combining NGramFilterFactory and BeiderMorseFilterFactory, but doesn't seem to work. For example, I want to store and index "AB11 9RD" and search as "a B 11 nine Rd". I am posting our schema.xml here. Any tips to implement would be greatly appreciate.

<types>
    <fieldType name="string" class="solr.StrField"/>
    <fieldType name="postcode" class="solr.TextField" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <tokenizer class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="1"/>
        </analyzer>
    </fieldType>
    <fieldType name="postcode_phonetic" class="solr.TextField" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <tokenizer class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="1"/>
            <filter class="solr.UpperCaseFilterFactory"/>
            <filter class="solr.BeiderMorseFilterFactory" nameType="GENERIC" ruleType="APPROX" concat="true" languageSet="auto"/>
        </analyzer>
    </fieldType>

<fields>
    <copyField source="Postcode" dest="PostcodePhonetic"/>
    <field name="Postcode"  type="postcode" indexed="true"  stored="true" multiValued="true"/>
    <field name="PostcodePhonetic"  type="postcode_phonetic" indexed="true" stored="false" multiValued="true"/>
    <field name="PostcodePhonetic2"  type="postcode_phonetic2" indexed="true" stored="false" multiValued="true"/>

2 Answers2

0

That's not what phonetic search means - phonetic search will convert words into their "phonetic" representation, where a phonetic representation just means that similarly sounding names converts to the same token. An example would be "nine" and "nhine" in this case.

Use an <tokenizer class="solr.NGramTokenizerFactory" minGramSize="1" maxGramSize="1" /> together with a synonym filter, where each digit has a synonym as its textual form. If you use a larger maxGramSize, you can also convert "11" into "eleven" as well.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
0

Here is an update for a solution. If anyone can suggest a better solution, please do

<analyzer type="index">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\b \b" replacement=""/>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/>
    <filter class="solr.ShingleFilterFactory" tokenSeparator="" minShingleSize="2" maxShingleSize="7" outputUnigrams="false"/>
    <filter class="solr.LengthFilterFactory" min="6" max="7"/>
  </analyzer>
</fieldType>