Solr copyField mixed with RegexTransformer

Question

Scenario:

In the database I have a field called Categories which of type string and contains a number of digits pipe delimited such as 1|8|90|130|

What I want:

In Solr index, I want to have 2 fields:

Field Categories_ pipe which would contain the exact string as in the DB i.e. 1|8|90|130|
Field Categories which would be a multi-valued field of type INT containing values 1, 8, 90 and 130

For the latter, in the entity specification I can use a regexTransformer then I specify the following field in data-config.xml: <field column="Categories" name="Navigation" splitBy="\|"/> and then specify the field as multi-valued in schema.xml

What I do not know is how can I 'copy' the same field twice and perform regex splitting only on one. I know there is the copyField facility that can be defined in schema.xml however I can't find a way to transform the copied field because from what I know (and I maybe wrong here), transformers are only available in the entity specification.

As a workaround I can also send the same field twice from the entity query but in reality, the field Categories is a computed field (selects nested) which is somewhat expensive so I would like to avoid it.

Any help is appreciated, thanks.

score 1 · Accepted Answer · edited May 23 '17 at 11:48

1

Instead of splitting it at data-config.xml. You could do that in your schema.xml. Here is what you could do,

Create a fieldType with tokenizer PatternTokenizerFactory that uses regex to split based on |.
FieldSplit: Create a multivalued field using this new fieldType, will eventually have 1,8,90,130
FieldOriginal: Create String field (if you need no analysis on that), that preserves original value 1|8|90|130|
Now you can use copyField to copy FieldSplit , FieldOriginal values based on your need.

Check this Question, it is similar.

edited May 23 '17 at 11:48

Community

1
1

answered Jan 25 '12 at 15:28

mailboat

1,077
9
17

Thanks for your answer, however I have a problem with point 3. Although it is a multivalued field, the 'stored' value is still just one value of `1|8|90|130`. The tokenizer does nothing to the stored value, it changes the internal structure of the index for that value. In fact that's what the question you referenced says as well. – mrd3650 Jan 26 '12 at 08:46
1

I could be wrong, but I don't think the PatternTokenizerFactory applies to the *stored valued* from a copyfield, at least that's the problem I seem to be having. The regex effect shows up in the facets, but not in the stored value for FieldSplit. – ghukill Oct 16 '14 at 18:08

score 1 · Answer 2 · answered Apr 03 '12 at 00:35

1

You can create two columns from the same data and treat them separately.

SELECT categories, categories as categories_pipe FROM category_table

Then you can split the "categories" column, but index the other one as-is.

answered Apr 03 '12 at 00:35

Walter Underwood

1,201
9
11

Solr copyField mixed with RegexTransformer

2 Answers2