6

I am using solr as a search engine. I have a case where a text field contains accent text like "María". When user search with "María", it is giving resut. But when user search with "Maria" it is not giving any result.

My schema definition looks like below:

<fieldtype name="my_text" class="solr.TextField">
       <analyzer type="Index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="32" side="front"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>

       </analyzer>
</fieldtype>

Please help to solve this issue.

pavan kumar
  • 391
  • 3
  • 6
  • 13

2 Answers2

13

If you're on solr > 3.x you can try using solr.ASCIIFoldingFilterFactory which will change all the accented characters to their unaccented versions from the basic ascii 127-character set.

Remember to put it after any stemming filter you have configured (you're not using one, so you should be ok).

So your config could look like:

<fieldtype name="my_text" class="solr.TextField">
       <analyzer type="Index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ASCIIFoldingFilterFactory"/>
           <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="32" side="front"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ASCIIFoldingFilterFactory"/>

       </analyzer>
</fieldtype>
soulcheck
  • 36,297
  • 6
  • 91
  • 90
  • Thanks. It worked. But we have a multicore environment with thousands of cores. Now how to update all cores schema.xml file ? Is there a way to accomplish this? – pavan kumar Apr 19 '14 at 13:37
  • @pavankumar now this is a different question. Probably best answered by some deployment automation tool. Ansible, chef or puppet to name a few. – soulcheck Apr 19 '14 at 15:20
  • Yes you are correct. Its a deployment related. I will know somewhere how to do this.Thank you! – pavan kumar Apr 20 '14 at 06:43
0

Answering here because it's the first result that pop when searching "ignore accents solr".

In the schema.xml generated by haystack (and using aldryn_search, djangocms & djangocms-blog), the answer provided by @soulcheck works if you add the <filter class="solr.ASCIIFoldingFilterFactory"/> line in the text_en fieldType.

Screenshot 1, screenshot 2.

sodimel
  • 864
  • 2
  • 11
  • 24