0

I'm looking at a very old solr instance (4-6 years since last touched), and I am seeing these extra dynamic fields, 'f_' and 'fs_' for multi and single valued facet fields.

My understanding, though, is that facets only happen in query-time.

Also, it's just a copy over - the fields dont change type.

So before I nuke these fields to kingdom come; is there a reason for facet fields in an index that is just a copied field?

Thanks

1 Answers1

1

Facets only happening query time is a bit of a misnomer - the content (the tokens) that the facet represents from is generated when indexing. The facet gives the distinct number of documents that has a specific token present.

That means that if the field type is identical and there is only one field being copied into the other named field, the behaviour between the source and the destination field should be identical.

However, if there are multiple fields copying content into the same field, the results will differ. Also be aware that the type is given from the schema for the field, it's not changed by the copyField instruction in any way. A copy field operation happens before any content runs through the indexing chain for the field.

Usually you want facets to be generated on string fields so that the indexed values are kept as-is, while you want to use a text field or similar for searching (with tokenization), since a string field would only give exact (including matching case) hits.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • Thanks, Mats. Homogenous data, such as that you would get in string fields, is ofc good for faceting. And no, no extra fields are being copied in either. I can see the data is exactly identical. And yes, the content of fields are being indexed at index time. But surely, when talking about facets, we do not index facet fields - rather, some fields are just better suited for faceting, isnt that correct? – Rasmus Edvardsen Dec 08 '19 at 09:48
  • 1
    That's correct, but you might want two fields with identical content processed in different ways. The content returned (i.e. the stored content if you fetch the fields with `fl` for a document) will be identical even if the processing behind the fields are different. If the analysis page for both fields and their definitions are identical, there shouldn't be any reason for keeping both. Unless it's a specific requirement for certain field names for the application using the Solr index - such as automagically adding facets for those prefixed with `f_` or `fs_`... – MatsLindh Dec 08 '19 at 18:57
  • Alright. I'll check that their treatment is similar in analysis - if that comes up ok, I'll remove them. Thanks again, Mats. – Rasmus Edvardsen Dec 09 '19 at 07:26