0

I have an ecommerce site where I am implementing Solr (using the Solarium library) and there are product names and descriptions that contain double quotes (usually standing for inches). Before I started to grasp the analyzer and tokenizer portion of Solr, I simply assigned the datatype of text_en_splitting to fields that would contain this data. If someone searches for the phrase - blue 1" binder - the double quote is being removed and the first 10 results being returned are not necessarily binders. The results returned seem to be matching the word blue and the number 1 (they aren't binders). Looking through the analysis of the query in Solr admin, I see the double quotes are getting removed from the WordDelimiterFilterFactory. I like WordDelimiterFilterFactory for other reasons (like dealing with the phrase post-it note) so I'm trying to find a happy medium here. Is there a better way to both index and query fields that contain double quotes that should be kept in place when performing searches (because they actually mean something)?

phpSteve
  • 41
  • 1
  • 7

2 Answers2

0

What I ended up doing was adding a replacement filter before the word delimiter and used the word inch.

<filter class="solr.PatternReplaceFilterFactory" pattern='(\d)"' replacement='$1 inch' replace="all"/>
intnick
  • 331
  • 1
  • 3
0

Solr Query Parsers (such as DisMax) use a call to

SolrPluginUtils.stripUnbalancedQuotes(userQuery)) 

to remove unbalanced quotes. Balanced quotes are for phrase queries.

So you should really design your own query parser.

You may also consider replacing quotes to feet at the front end, before query comes to Solr.

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Fuad Efendi
  • 155
  • 1
  • 9