4

my solr 4.1.0 installation does not find anything with phonetic encoding. The excerpts from schema.xml:

<field name="textsuggest" type="text_suggest" indexed="true" stored="true" omitNorms="true" multiValued="true" />
<field name="textphon" type="text_phonetic_do" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="false" multiValued="true" />
<copyField source="textsuggest" dest="textphon"/>

...

<fieldType name="text_phonetic_do" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.GermanNormalizationFilterFactory" />
        <filter class="solr.SynonymFilterFactory" synonyms="lang/synonyms_de.txt"
            ignoreCase="true" expand="false" /> 
        <filter class="solr.PhoneticFilterFactory" encoder="ColognePhonetic" inject="false" />
    </analyzer>
</fieldType>

text_suggest is more or less a lowercased version of the original text, tokenized with solr.StandardTokenizerFactory and solr.WordDelimiterFilterFactory. The phonetic encoder is one specialized for German words. The synomym filter processes some domain specific words. I was inspired by http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/.

I index an entry with "Geprüfter Betriebswirt" and other items in textsuggest. Now when I search for "Betriebswirt" I get expected results. However searching for "Betribswirt" which is just a minor misspelling of the original German word, solr reports 0 hits.

In the analyze view of solr's admin gui I tried different spellings of "Betriebswirt" and my field type text_phonetic_do, and they all get encoded to the same number stream:

  • betriebswirt => 12718372
  • betribswirt => 12718372
  • betribswiiirt => 12718372
  • petribswiert => 12718372

So the encoding (analyze time and search time) works as expected. But as said above, solr does not find any document when searching for the phonetic variant.

I use the query view and even the query textphon:Betriebswirt doesn't return a single result. The full query result (I stripped the timing part) looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
    <str name="debugQuery">true</str>
    <str name="indent">true</str>
    <str name="q">textphon:Betriebswirt</str>
    <str name="wt">xml</str>
  </lst>
</lst>
<result name="response" numFound="0" start="0">
</result>
<lst name="debug">
  <str name="rawquerystring">textphon:Betriebswirt</str>
  <str name="querystring">textphon:Betriebswirt</str>
  <str name="parsedquery">textphon:12718372</str>
  <str name="parsedquery_toString">textphon:12718372</str>
  <lst name="explain"/>
  <str name="QParser">LuceneQParser</str>
</lst>
</response>

I don't know why it doesn't find anything. If I understand the debug output correctly the index even gets searched for the right (read: phonetically encoded) token.

So what am I missing? Can anybody point me in the right direction? Thanks

chammp
  • 822
  • 1
  • 10
  • 20
  • Are you chaining your copyFields? – Avi Kaminetzky Apr 19 '18 at 17:11
  • Honestly I don't remember anymore - the problem more than 5 years old :D I'd mark the question as obsolete or something, but I'm not sure if this is possible. I just know that stuff works by now (switched to Elastic, though) – chammp Apr 20 '18 at 07:11

0 Answers0