Search in Solr with also hashtag included

Question

Suppose if I search in solr with keyword IPL, I want results that include both IPL and #IPL. How to attain this?

I tried WordDelimFactory like this below in index and query but didn't work out .

I think I have to split the string to "string" and "#string" but do not know how to do that.

Vinod · Answer 1 · 2016-04-29T09:48:14.423

0

if you want every keryword to search as #keyword then you can try using OR operator in query like

/select?q="IPL" OR "#IPL"

if you want to search in specific field then

/select?q=title:"IPL" OR title:"#IPL"

You may try with synonyms. But in this case its just prefixing #.

go to your config files of solr instance. inside conf folder edit synonyms.txt file.

IPL => #IPL

OR

change your query field tokenizer in schema.xml file,

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">  
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

then query with field name /select?q=text:#IPL

since in solr text is default you can just try /select?q=#IPL

edited Apr 29 '16 at 09:48

answered Apr 29 '16 at 03:20

Vinod

1,965
1
9
18

I would like to do this for all search keywords not just IPL. How to do configure in that way in synonyms.txt? – Babu Apr 29 '16 at 03:23
your query term is tokenized. solr.StandardTokenizerFactory will remove # from your keyword. – Vinod Apr 29 '16 at 06:33
am using whitespacefactory – Babu Apr 29 '16 at 08:24
I modified answer check it. change your query field tokenizer in schema.xml file – Vinod Apr 29 '16 at 09:49

score 0 · Answer 2 · answered Apr 29 '16 at 12:57

This is done with the WordDelimiterFilterFactory. Set generateWordParts=1. Also keep preserveOriginal=1. This will keep the original and will create new without the #.

After modifying the schema.xml restart the server and re-index the data.

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="0"
                splitOnCaseChange="0"
                preserveOriginal="1"/>
        <filter class="solr.LengthFilterFactory" min="2" max="100" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

      </analyzer>
    </fieldType>

Search in Solr with also hashtag included

2 Answers2