1

Scenario:

In the database I have a field called Categories which of type string and contains a number of digits pipe delimited such as 1|8|90|130|

What I want:

In Solr index, I want to have 2 fields:

  • Field Categories_ pipe which would contain the exact string as in the DB i.e. 1|8|90|130|
  • Field Categories which would be a multi-valued field of type INT containing values 1, 8, 90 and 130

For the latter, in the entity specification I can use a regexTransformer then I specify the following field in data-config.xml: <field column="Categories" name="Navigation" splitBy="\|"/> and then specify the field as multi-valued in schema.xml

What I do not know is how can I 'copy' the same field twice and perform regex splitting only on one. I know there is the copyField facility that can be defined in schema.xml however I can't find a way to transform the copied field because from what I know (and I maybe wrong here), transformers are only available in the entity specification.

As a workaround I can also send the same field twice from the entity query but in reality, the field Categories is a computed field (selects nested) which is somewhat expensive so I would like to avoid it.

Any help is appreciated, thanks.

mrd3650
  • 1,371
  • 3
  • 16
  • 24

2 Answers2

1

Instead of splitting it at data-config.xml. You could do that in your schema.xml. Here is what you could do,

  1. Create a fieldType with tokenizer PatternTokenizerFactory that uses regex to split based on |.
  2. FieldSplit: Create a multivalued field using this new fieldType, will eventually have 1,8,90,130
  3. FieldOriginal: Create String field (if you need no analysis on that), that preserves original value 1|8|90|130|
  4. Now you can use copyField to copy FieldSplit , FieldOriginal values based on your need.

Check this Question, it is similar.

Community
  • 1
  • 1
mailboat
  • 1,077
  • 9
  • 17
  • Thanks for your answer, however I have a problem with point 3. Although it is a multivalued field, the 'stored' value is still just one value of `1|8|90|130`. The tokenizer does nothing to the stored value, it changes the internal structure of the index for that value. In fact that's what the question you referenced says as well. – mrd3650 Jan 26 '12 at 08:46
  • 1
    I could be wrong, but I don't think the PatternTokenizerFactory applies to the *stored valued* from a copyfield, at least that's the problem I seem to be having. The regex effect shows up in the facets, but not in the stored value for FieldSplit. – ghukill Oct 16 '14 at 18:08
1

You can create two columns from the same data and treat them separately.

SELECT categories, categories as categories_pipe FROM category_table

Then you can split the "categories" column, but index the other one as-is.

Walter Underwood
  • 1,201
  • 9
  • 11