0

In solrconfig.xml file

Copy FIelds

<copyField source="Name" dest="NameKeywords"/>
<copyField source="Keywords" dest="NameKeywords"/>

New Field

  <field name="NameKeywords" type="NameKeywordFieldType" indexed="true" stored="true" multiValued="true"/>

Custom Field Type

<fieldType name="NameKeywordFieldType" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
       <filter class="solr.HyphenatedWordsFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
       <filter class="solr.HyphenatedWordsFilterFactory"/>
    </analyzer>
  </fieldType>

So, when I searched anything with the NameKeywords field, nothing is working (empty array returned)

Result of search with NameKeywords

{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"NameKeywords:black",
      "_":"1582270957982"}},
  "response":{"numFound":0,"start":0,"docs":[]
  }}

But when I searched with the Name field, All working fine.

Result of search with Name

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"Name:black",
      "fl":"Name",
      "rows":"2",
      "_":"1582270957982"}},
  "response":{"numFound":32560,"start":0,"docs":[
      {
        "Name":"40037 Black And Stripe Top, Black & Stripe / 10"},
      {
        "Name":"40037 Black And Stripe Top, Black & Stripe / 12"}]
  }}

So what is missing with the NameKeywords field?

Prashant Patil
  • 2,463
  • 1
  • 15
  • 33
  • I changed "KeywordTokenizerFactory" to "StandardTokenizerFactory" and its working.. – Prashant Patil Feb 21 '20 at 09:19
  • 1
    Why are you using the `KeywordTokenizer` if you want to split the text on whitespace? Use the `StandardTokenizer` or the `WhitespaceTokenizer` in that case. – MatsLindh Feb 21 '20 at 09:20
  • Can you please advise which is best for search as well as filters? – Prashant Patil Feb 21 '20 at 09:21
  • I got answer here https://stackoverflow.com/questions/11183017/difference-between-whitespacetokenizerfactory-and-standardtokenizerfactory – Prashant Patil Feb 21 '20 at 09:26
  • 2
    There is no one thing that is "best for search as well as filters". You have to define the set of behaviors that match what you want - each use case will be different. In general, using the same field for filtering/faceting/searching will usually not give a good experience for either. Use different fields for different use cases and behaviors. – MatsLindh Feb 21 '20 at 10:08

2 Answers2

1

As you are using the KeywordTokenizerFactory, the token will not be created and the text will be treated as single token. When you search, you will have to search with that single token.

<analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>

In: "Please, email john.doe@foo.com by 03-09, re: m37-xq."

Out: "Please, email john.doe@foo.com by 03-09, re: m37-xq."

If you want to generate the tokens, you should be using the StandardTokenizer or WhitespaceTokenizer as the tokeniser and not the KewordTokenizer.

The StandardTokenizer creates the words/tokens at punctuation characters, removing punctuation.

<analyzer>
  <tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>

In: "Please, email john.doe@foo.com by 03-09, re: m37-xq."

Out: "Please", "email", "john.doe", "foo.com", "by", "03", "09", "re", "m37", "xq"

A WhitespaceTokenizer is a tokenizer that divides text at whitespace.

<analyzer>
  <tokenizer class="solr.WhitespaceTokenizerFactory" rule="java" />
</analyzer>

In: "To be, or what?"

Out: "To", "be,", "or", "what?"

Abhijit Bashetti
  • 8,518
  • 7
  • 35
  • 47
0

Changed:

KeywordMarkerFilterFactory To StandardTokenizerFactory

<fieldType name="NameKeywordFieldType" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
       <filter class="solr.HyphenatedWordsFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
       <filter class="solr.HyphenatedWordsFilterFactory"/>
    </analyzer>
  </fieldType>
Prashant Patil
  • 2,463
  • 1
  • 15
  • 33