0

I try to use the contextField in a SOLR suggester (running SOLR 7)

But when I try to build the suggester, I get an error

Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="exacttext" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is...

There is no field named exacttext in my entire setup, though.

When I use the FreeTextLookupFactory, this doesn't happen, but then I can't use the contextField, of course.

I tried adding

<filter class="solr.LengthFilterFactory" min="2" max="32700"/>

as well as

<filter class="solr.TruncateTokenFilterFactory" prefixLength="100"/>

to the managed schema, but this didn't work, either.

Here's the searchComponent I tried using:

<searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">suggestText</str>
      <str name="highlight">false</str>
      <str name="storeDir">mySuggester</str>
      <str name="separator"> </str>
        <str name="suggestAnalyzerFieldType">suggestField</str>
      <str name="buildOnCommit">false</str>
      <str name="buildOnStartup">false</str>
        <str name="contextField">context_field</str>
    </lst>
</searchComponent>

And this is the config for the suggester in the managed-schema:

<fieldType name="suggestField" class="solr.TextField" positionIncrementGap="100">
    <analyzer>

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" catenateWords="1" generateNumberParts="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.LengthFilterFactory" min="2" max="32700"/>
        <filter class="solr.TruncateTokenFilterFactory" prefixLength="100"/>
    </analyzer>
</fieldType>

-- the part for the fields ---

<field name="ID" type="text_general" indexed="true" stored="true"/>
    <field name="TITLE" type="tokenized" indexed="true" stored="true"/>
    <field name="ANRISS" type="tokenized" indexed="true" stored="true"/>
    <field name="LEAD" type="tokenized" indexed="true" stored="true"/>
    <field name="BODY" type="tokenized" indexed="true" stored="true"/>
    <field name="PDFDOC" type="text_general" indexed="true" stored="true"/>
    <field name="MAGID" type="text_general" indexed="true" stored="true"/>
    <field name="MAGNAME" type="text_general" indexed="true" stored="true"/>
    <field name="MAGISSUE" type="text_general" indexed="true" stored="true"/>
    <field name="ARTICLETYPE" type="text_general" indexed="true" stored="true"/>
        <field name="IS_FREE" type="text_general" indexed="true" stored="true" />
        <field name="THEMA" type="text_general" indexed="true" stored="true"/>
    <field name="CREATIONDATE" type="pdate" indexed="true" stored="true"/>
    <field name="LASTUPDATE" type="pdate" indexed="true" stored="true"/>

    <copyField source="TITLE" dest="fulltext" />
    <copyField source="ANRISS" dest="fulltext" />
    <copyField source="LEAD" dest="fulltext" />
    <copyField source="BODY" dest="fulltext" />

    <field name="fulltext" stored="true" type="tokenized" multiValued="true" indexed="true" />

    <copyField source="TITLE" dest="suggestText" />
    <copyField source="ANRISS" dest="suggestText" />
    <copyField source="LEAD" dest="suggestText" />
    <copyField source="BODY" dest="suggestText" />

    <field name="suggestText" stored="true" type="text_general" multiValued="true" indexed="true" />

    <copyField source="MAGID" dest="context_field" />
    <copyField source="ARTICLETYPE" dest="context_field" />

    <field name="context_field" stored="true" type="suggestField" multiValued="true" indexed="true" />
Swissdude
  • 3,486
  • 3
  • 35
  • 68
  • You'll probably have to remove the index completely and reindex after making those changes - you might be able to make it work by just reindexing and then issuing an optimize (or, if you're really lucky, just by issuing the optimize, causing the index to be properly rewritten without old fields from deleted documents). – MatsLindh Sep 15 '19 at 17:29
  • Thanks Mats. I actually already tried to reindex. But maybe I didn't apply the correct changes (too much try/error). I'll give this a shot again. – Swissdude Sep 15 '19 at 19:47
  • You may refer https://www.drupal.org/project/search_api_solr/issues/2941720 – Shubhangi Sep 16 '19 at 11:10

0 Answers0