Using Solr, how to configure a fieldType analyzer to see a numeric character then space then alpha characters together as a separate token?

Question

I have created a field 'allsearchstr' that is copied from approximately 200 other fields. I would like the end user to be able to query that field with something like '2 ohm resistors' that would return all docs having a resistance of 2 ohms.

Sample content of the allsearchstr field for one document looks like the following:

"allsearchstr":\["Y",
"Thick Film",
"1",
"0402 RES 1/16W 2R0 5%",
"General Purpose",
"0.35",
"Solder Pads",
"63mW",
"SMT",
"Active",
"2 Ohm",
"RESISTORS",
"CHIP RESISTORS",
"PASSIVE",
"50V",
"Chip Technologies",
"5%",
"0402 (1005)",
"-55 to +155C",
"±200ppm",
"CR0402J2R0T1LF",
"0.5"\]

The current fieldType is configured in the schema as:

<fieldType name="utstring" class="solr.TextField" omitNorms="true" sortMissingLast="true"\>
  <analyzer\>
    <tokenizer class="solr.KeywordTokenizerFactory"/\>
    <filter class="solr.LowerCaseFilterFactory"/\>
  </analyzer\>
</fieldType\>

This config sees '2 ohm resistors' as one token.

I have tried various tokenizers & filters but have not been successful in finding the correct combination. I keep ending up with tokens that either see the whole input as one token or tokenizes split on spaces.

I am using Solr 7.5.0 on Centos

Any assistance pointing me in the right direction would be appreciated.

Using Solr, how to configure a fieldType analyzer to see a numeric character then space then alpha characters together as a separate token?

0 Answers0