0

I have indexed a small collection (about 150k documents). I give user the ability to make filtered queries using dropdown boxes. The “field query” fields are: apo_taxonomy, apo_dik, apo_number, and apo_date. Below is a portion of schema.xml:

<fieldType name="text_efe_dioi_s" class="solr.TextField" positionIncrementGap="100" >
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="20"/>
            <filter class="solr.GreekLowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="20"/>
            <filter class="solr.GreekLowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>


    <fieldType name="text_efe_dioi" class="solr.TextField" positionIncrementGap="100">    
      <analyzer type="index">       
        <tokenizer class="solr.StandardTokenizerFactory"/>      
        <filter class="solr.GreekLowerCaseFilterFactory"/>
        <filter class="solr.GreekStemFilterFactory"/>
      </analyzer>     
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>      
        <filter class="solr.GreekLowerCaseFilterFactory"/>
        <filter class="solr.GreekStemFilterFactory"/>       
      </analyzer>
    </fieldType>


<fields>
<field  name="ida" type="string" indexed="true" stored="true" multiValued="false"/>
  <field  name="solr_id" type="string" indexed="true" stored="true" multiValued="false"/> 
  <field  name="apo_number" type=" text_efe_dioi " indexed="true" stored="true" multiValued="true"/>
  <field  name="apofasi_date" type=" text_efe_dioi " indexed="true" stored="true"/>
  <field  name="apo_dik" type=" text_efe_dioi " indexed="true" stored="true"/>
  <field  name="apo_taxonomy" type=" text_efe_dioi " indexed="true" stored="true"/>
  <field  name="content" type=" text_efe_dioi " indexed="true" stored="true" multiValued="true"/> 
  <field  name="type" type="string" indexed="true" stored="true"/>  
  <field  name="model" type="string" indexed="true" stored="true" multiValued="false"/>  
  <field  name="url" type="string" indexed="true" stored="true"/>
  <field  name="search_tag" type=" text_efe_dioi " indexed="true" stored="true"/>
  <field  name="contentbin" type="text" indexed="true" stored="true" multiValued="true"/>
  <field  name="last_modified" type="string" indexed="true" stored="true"/>  
  <field  name="title" type=" text_efe_dioi " indexed="true" stored="true" multiValued="true"/>
  <field  name="grid_title" type=" text_efe_dioi " indexed="true" stored="true"/>
  <field  name="contentS" type=" text_efe_dioi _s" indexed="true" stored="true"/>
 </fields>

<copyField source="apo_number" dest="content" />       
   <copyField source="apo_date" dest="content" />   
   <copyField source="apo_dik" dest="content" />   
   <copyField source="apo_taxonomy" dest="content" />   
   <copyField source="title" dest="content" />    
   <copyField source="search_tag" dest="content" />
   <copyField source="contentbin" dest="content"/>     
   <copyField source="content" dest="contentS" />

I provide also a portion of solrconfig.xml concerning the “SearchHandler”. I have done this in order to boost on “exactish” (anchored) phrase matching:

<requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <!--<str name="defType">edismax</str>
       <str name="qf">content contentS^10</str>
       <str name="pf">content^10 contentS^100</str>
       <str name="ps">100</str>-->
       <str name="echoParams">explicit</str>
       <int name="rows">150</int>
       <str name="sort">score desc</str>
       <str name="defType">edismax</str>
       <str name="qf">content contentS^10</str>
       <str name="pf">content^10 contentS^100</str>
       <str name="ps">100</str>
       <str name="wt">json</str>
       <str name="hl">true</str>       
       <str name="fl">solr_id,ida,type,model,keywordlist,title,apo_taxonomy,apo_dik,apo_date,grid_title</str>
       <str name="hl.fl">content,title</str>
       <str name="f.content.hl.alternateField">content</str>
       <str name="hl.maxAlternateFieldLength">800</str>
       <str name="hl.fragsize">800</str>       
     </lst>  
    </requestHandler>

Some valuable comments:

  1. The “apo_taxonomy” field can hold values like: “Πόρτα”, “Πόρτα-1”, and “Πόρτα-ασφ1”
  2. The “apo_dik” field can hold values like: “Μια”, “Μιάμιση”, and “ΟΧΤΟ”
  3. The “apo_date” and “apo_number” fields can hold numeric values.
  4. All the above fields have been using “”. The reason that I use "solr.TextField" class is to copy the above fields into one field (“content”) and make them searchable via solr’s basic query (“q” parameter).
  5. The whole collection is in Greek language.

My questions:

  1. When user selects (using dropdown boxes) apo_taxonomy value of “Πόρτα” Solr returns documents containing “Πόρτα-1”, and “Πόρτα-ασφ1” (http://example.com/solr/efe_dioi/select/?q=:&fq=apo_taxonomy:( Πόρτα)+apo_date:(2009)&start=0&rows=100). This is not what user needs. When user filters the collection for documents of “Πόρτα” (apo_taxonomy) he/she don’t what to see documents of “Πόρτα-1” and/or “Πόρτα-ασφ1”. Is that feasible using “solr.TextField”? As you noticed I need the “filter fields” to be searchable using the “q” parameter plus boost on “exactish” match.

  2. I think of adding one more filter: “apo_ses”. The field would hold values like: “ΜΕΡΑ”, “ΜΕΣΗΜΕΡΙ”, “ΑΠΟΓΕΥΜΑ”, and “ΒΡΑΔΥ”. Is it possible to give solr instructions when filtering using value let’s say “ΜΕΡΑ” to return documents filtered by “ΜΕΡΑ” AND “ΜΕΣΗΜΕΡΙ” or “ΜΕΡΑ” OR “ΜΕΣΗΜΕΡΙ”?

Any help would be greatly appreciated.

I hope not to bore you with my writing.

sehe
  • 374,641
  • 47
  • 450
  • 633

1 Answers1

0

For your question 1, i suggest using type as string . If your field is (example: apo_taxonomy) also going to be used for search , then consider using apo_taxonomy_exact with string type for fq, where apo_taxonomy_exact is copy of apo_taxonomy in it's non tokenized form for fq purpose. <copyField source="apo_number" dest="apo_taxonomy_exact" /> Type for apo_taxonomy_exact would be :

For your second question, yes do something like fq=apo_ses:((“ΜΕΡΑ” AND “ΜΕΣΗΜΕΡΙ”) OR “ΜΕΡΑ” OR “ΜΕΣΗΜΕΡΙ”)

Arun
  • 1,777
  • 10
  • 11
  • For answer 1: How do you suggest to index the "apo_taxonomy_exact". notice that "apo_taxonomy" is a string not number. For answer 2: I am aware of that kind of query. What i am asking here is if it's possible user selects one value from the dropdown box (ex. "ΜΕΡΑ") and fliter “ΜΕΡΑ” AND “ΜΕΣΗΜΕΡΙ” using a "solrish" approach. Maybe "aliasing" –  Jan 07 '14 at 16:12
  • what i am saying is that when i filtering for (example: apo_taxonomy) value: Πόρτα solr returns docs for "Πόρτα" , "Πόρτα-1" "Πόρτα-2". What tokenizer should i use for the new field "apo_taxonomy_exact"? –  Jan 07 '14 at 16:18
  • Your apo_taxonomy is not a "string" it is of type=" text_efe_dioi" which of type solr.TextField , which is tokenized , type "string" is non-tokenized . Please see my update. For question 2, no there is no Solarish approach. Think of it as this, searchengine should not handle rules like this, it is better to keep it clean and have special rules handled outside of so that you have most flexibility. – Arun Jan 07 '14 at 18:33
  • Should i use "solr.StrField" class? –  Jan 07 '14 at 18:47
  • Yes, so you filed will need to sue "string" – Arun Jan 07 '14 at 20:11