0

Currently, I am facing the following small problem while doing exact search (query enclosed within double quotes).

{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "q": "\"sale\"",
      "indent": "true",
      "fl": "displayValue, categoryName, approved, averageRating, lastOneWeekCount, connectorName, score",
      "wt": "json",
      "_": "1579279511471"
    }
  },
  "response": {
    "numFound": 918,
    "start": 0,
    "maxScore": 11.044312,
    "docs": [
      {
        "displayValue": "Net Sales  Vs Contribution Margin",
        "categoryName": "Sales Analytics (B07)",
        "connectorName": "New BOBJ",
        "lastOneWeekCount": 3,
        "approved": "yes",
        "averageRating": 4,
        "score": 11.044312
      },

The above "sale" query is matching against "Sales" term in the indexed data, which is not exact. Also this is happening because of the EdgeNgramFilterFactory that is in the defined text field (which uses whitespace tokenizer). I have managed to incrementally resolve different search issues with the current implementation of select request handler and now I need to solve the above problem of exact match. Following is my solrconfig details.

    <lst name="defaults">
      <str name="exact">false</str>
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
      <str name="defType">edismax</str>
      <str name="qf">
         displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2 
         categoryName^8 reportOwner^2 reportDetailsNameColumn^5 
      </str>
      <str name="pf2">
         displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2 
         categoryName^8 reportOwner^2 reportDetailsNameColumn^5 
      </str>
      <str name="pf3">
         displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2 
         categoryName^8 reportOwner^2 reportDetailsNameColumn^5 
      </str>
      <str name="tie">1</str>
      <str name="mm">100%</str>
      <int name="ps2">3</int>
      <int name="ps3">9</int>
      <int name="qs">0</int>
      <str name="df">text</str>
      <str name="q.alt">*:*</str>
      <str name="sort">score desc, averageRating desc, lastOneWeekCount desc</str>
      <str name="bq">
        query({!boost b=20}approved:"yes")
      </str>
    </lst>
    <lst name="appends">
      <str name="fq">{!switch case.false='*:*' case.true='text_ex:$q' v=$exact}</str>
    </lst>
  </requestHandler>

In the above config details, I have attempted to solve the exact search problem by adding an extra switch case query parser in the config (after searching the net). Basically, I want to implement exact search if user input query has double quotes. I wanted to implement exact search when user specifies exact=true using the switch query parser. But I am kind of stuck as I am not getting any results. Can someone please help?

P.S Attaching the schema definition as well. Please check.

<fieldType name="text_ws" class="solr.TextField" omitNorms="false">
        <analyzer type="index" omitTermFreqAndPositions="false">  
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />    
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer> 
        <analyzer type="query"> 
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" /> 
        </analyzer> 
    </fieldType>


    <fieldType name="text_exact" class="solr.TextField" omitNorms="false">
        <analyzer type="index" omitTermFreqAndPositions="false">  
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" 
            catenateWords="0" catenateNumbers="0" preserveOriginal="1" catenateAll="0" splitOnCaseChange="0"/>
            <filter class="solr.LowerCaseFilterFactory" />    
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer> 
        <analyzer type="query"> 
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" 
            catenateWords="0" catenateNumbers="0" preserveOriginal="1" catenateAll="0" splitOnCaseChange="0"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer> 
    </fieldType>
racsan308
  • 31
  • 1
  • 4

1 Answers1

3

Using double quotes does not mean exact. It only allows you to make phrase queries where the terms have to appear after each other. Solr (Lucene) searches against the tokens you've generated.

Use a field with a specific definition that does not change the tokens (i.e. no ngrams, no stemming, etc). If you only want to match the whole field exactly (but case insensitive), use a KeywordTokenizer with a LowercaseFilter. If you only want case sensitive, exact hits for the whole field, use a string field.

If you want exact matches against each term, use a tokenizer with the behavior you're after, and pick filters to normalize case (i.e. to make it case insensitive) or not. You then decide which field to query based on whether the user is asking for an exact search or not.

You're going to have to determine how "foo" bar should behave and how "foo bar" baz should behave as well.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • Many people consider double quoted search as exact match search and so does my client. I need to extend the solr functionality of treating double quote search as phrase queries (with query slop zero for me), to term queries as well. Please find the updated question with schema definition, because even after `KeywordTokenizer` was included, it didn't help solve the exact match problem. Also, as of now, my clientele searches are limited to either searching with double quotes or without them, so I don't have to think about mixed queries. Please help. – racsan308 Jan 19 '20 at 11:14
  • Your word delimiter factory is still going to split text based on rules (for example split on `-`). Could you add an example of what you've indexed, and what your complete query looks like? – MatsLindh Jan 19 '20 at 11:33
  • For example if I search for `"sale"` solr should return exact matches of the term `sale` but it is returning matches for `Sales`. I have metadata of reports indexed as report title (displayValue), report category (categoryName) etc. The behaviour is due to the `EdgeNgramFilterFactory`, which I need when queries are not enclosed within double quotes. Let me know if you need any clarification. – racsan308 Jan 19 '20 at 13:17
  • 1
    You'll have to switch which fields you search against based on whether the query starts with `"` or not. This is a job for your controller. – MatsLindh Jan 19 '20 at 16:24
  • I was trying to implement the exact search by following this [link](https://lucidworks.com/post/custom-solr-request-params/). But I am unable to generate the query URL as needed to make the switch query parser work. Can you please check? @MatsLindh – racsan308 Jan 20 '20 at 09:57
  • That's hard to say unless you also include what query URLs you've tried. Does it work if you make the query directly without going through the switch query parser? – MatsLindh Jan 20 '20 at 10:34
  • Yes, it works directly without going through the switch query parser. But unable to make it work for the exact search. I am working on it through Solr Admin UI panel. So I would get the generated query URL after I type in a query in my core of Admin UI and I do not know how to enable this switch feature of my request handler while querying in the Admin UI. If you could share your personal contact, I would happy to show you my problem exactly. – racsan308 Jan 20 '20 at 11:27
  • URL with exact=true (from above solrconfig details): http://localhost:8984/solr/synoptics_core/select?q=sale+AND+exact%3Dtrue&fl=displayValue%2C+reportDetailsNameColumn%2C+categoryName%2C+approved%2C+averageRating%2C+lastOneWeekCount%2C+connectorName%2C+score&wt=json&indent=true – racsan308 Jan 21 '20 at 06:45
  • URL with exact=false (default): http://localhost:8984/solr/synoptics_core/select?q=sale&fl=displayValue%2C+reportDetailsNameColumn%2C+categoryName%2C+approved%2C+averageRating%2C+lastOneWeekCount%2C+connectorName%2C+score&wt=json&indent=true – racsan308 Jan 21 '20 at 06:46
  • I think you've misunderstood where the `switch` qparser takes it parameter from - `$exact` refers to a URL query parameter, not a "field" in the query itself. `?q=sale&..&exact=true` would be the way to supply `$exact` to the `v` parameter. In your example queries, it'd be possible to just to prefix the field name `text_exact` directly to your query as well. – MatsLindh Jan 21 '20 at 08:54
  • I do not want to query using the field name as prefix, as I need to parse what the client enters (in the search form of the app). I cannot expect them to type the field prefix while querying. That being said, if `$exact` refers to a URL parameter, then how do I generate the required query to trigger the switch qparser using the edismax query parser in the solrconfig.xml? – racsan308 Jan 21 '20 at 10:15
  • As mentioned, you append `&exact=true` to your query string. Your configuration uses `$exact`, which refers to the value supplied as part of the query string. – MatsLindh Jan 21 '20 at 10:18
  • I have appended `&exact=true` to the query URL and it is giving results but I haven't found matching documents even with `&exact=true` parameter. The query `"sale"` is matching against `Sales` again. Is there something wrong in the schema definition? Can you have a look at it?. I am thinking the wordt – racsan308 Jan 21 '20 at 12:39
  • You're going to have to debug that issue separately first; does `text_exact:sale` match documents with `Sales`? Use the `Analysis` page under Solr Admin to see exactly how your content is being processed for each step for the field type you've defined. – MatsLindh Jan 21 '20 at 12:43
  • Hi @MatsLindh `text_exact:sale` isn't giving any match. I am guessing this is because of `text_exact` field definition. Firstly, `text_exact` field is a copyField destination for every other source text field/s (among these there are multivalued fields). Secondly, the analysis page is giving results as expected from the schema definition. But when I search for `text_exact:sale`, there are no results. Also, can you please confirm the correctness of the switch query filter condition in the request handler above? Especially this part `case.true="text_ex:$q"`. No data in text_ex, I suppose. – racsan308 Jan 23 '20 at 18:40
  • No, that should probably be `text_exact`, but you're going to have to debug why `text_exact:sale` isn't giving any hits. If you're doing analysis and it gives the expected result (??) you should get a hit. If you didn't reindex after changing the definitions, you'll have to do that. – MatsLindh Jan 23 '20 at 23:07
  • Here text_ex is the field and text_exact is the field type. Can you now clarify if this is correct?...i.e. `"text_ex:$q"` ?... I will try reindexing again – racsan308 Jan 24 '20 at 09:26
  • Ah, in that case, I meant `text_ex` in your test with `sale` above (and that you said it's a destination for copy fields). It's hard to keep track of these names across questions, sorry. You don't query a field type, just a field. – MatsLindh Jan 24 '20 at 09:48
  • Yes, I agree, it's hard to keep track of names. By the way, I have tried querying for `market` after reindexing and I did not get results. But, when I tried querying normally (without double quotes), there were reports having only `market` terms (which are expected to be returned when one does the exact search). Exact search query URL. Please check. [link](http://localhost:8984/solr/synoptics_core/select?q=sale&exact=true&wt=json&indent=true&debugQuery=true) `"filter_queries":["{!switch case.false='*:*' case.true='text_ex:$q' v=$exact}"], "parsed_filter_queries":["text_ex:q"]` – racsan308 Jan 24 '20 at 10:44
  • Also, I am thinking if this exact search could be achieved in this way also, as the issue described in the link is exactly what is needed from my client too. Please have a look [link](https://stackoverflow.com/questions/24344740/solr-exact-match-regarding-words-number). Thing is, I have tried this method (CloneFieldUpdateProcessorFactory) but I solr loading error occurred when I restarted. Can you throw some light on this _prefix_ and _suffix_ method as described in the link? – racsan308 Jan 24 '20 at 10:53
  • I have added certain details in this [link](https://stackoverflow.com/questions/59931921/solr-exact-match-results-not-matching), which is an extension of our conversation we had above. Can you please check? – racsan308 Jan 27 '20 at 21:54