Huge time difference in facet queries

Question

I have a SOLR DB with ca. 70M documents. Certain query returns about 300 documents. With

facet.field=A it takes only 4 ms,
facet.field=B needs 800 ms to return!

Are there errors in my schema? Can it be done faster?

<fieldtype name="B_type" class="solr.TextField" positionIncrementGap="100"    
           sortMissingLast="true" omitNorms="true">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.StandardFilterFactory" ignoreCase="true" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.StandardFilterFactory" ignoreCase="true" />
    </analyzer>
</fieldtype>

<field name="A" type="string" indexed="true" stored="true" multiValued="false" />
<field name="B" type="B_type" indexed="true" stored="false" multiValued="true" />

score 6 · Accepted Answer · edited Jun 20 '20 at 09:12

Field A is of type string, which is good for use as facet. Your Field B is analyzed, you strip of special chars and you lower case it, which is not so good for use as a facet. The later things are done when applying the StandardFilterFactory.

In Solr's Wiki there is an interesting part about facets

Because faceting fields are often specified to serve two purposes, human-readable text and drill-down query value, they are frequently indexed differently from fields used for searching and sorting:

They are often not tokenized into separate words

They are often not mapped into lower case

Human-readable punctuation is often not removed (other than double-quotes)

There is often no need to store them, since stored values would look much like indexed values and the faceting mechanism is used for value retrieval.

As you can see you are missing the two points in the middle, you lower case and you remove special chars.

As advised in Indexing Fields with SOLR and LowerCaseFilterFactory you should introduce a new field in your schema, which should be of type string and be kept in sync with your field B via copyField. That new field should be used for faceting and it should be quicker. We name such fields usually with a suffix, like B_raw.

Since you do have 70m documents it would be a good idea to test it with a subset in advance to save your time.

my B-field is already a copy from "B-origin" and used only for faceting. I will try to declare it string and use LowerCaseFilterFactory. — Stefan Weiss, Dec 18 '13 at 08:09
Looks good for 1M documents. has to reindex the whole DB next days and make a "big" test — Stefan Weiss, Dec 18 '13 at 09:10

Huge time difference in facet queries

1 Answers1