I'm using Solr 3.6 to index many different types of documents. I have several fields that define common information for all the documents, one of them being 'date' (ideally last modified date, just something to indicate how recent a document is.)
<field name="date" type="date" indexed="true" stored="true" required="true" />
My problem arises when trying to index rich text documents like .docx and .pdf. I want to fill in the date field using metadata that I get from the ExtractingRequestHandler, but the name of the field that the date information I want is stored in is different for each file. Sometimes the field I want is 'date', othertimes it's 'last_modified' or 'last_save_date'. I was trying to use 'last_modified' to provide the date in the handler:
<str name="fmap.last_modified">date</str>
..but this led to problems where date was either multivalued (since there was 'date' metadata) or undefined (because 'last_modified' didn't exist). I looked into using conditional copyFields to try to extract data from at least one of these fields, but that seems complicated (i.e. extending the update handler) and would also require that I know the name of every possible field that could contain this date information.
Is there any way that I can reliably extract a date from every rich-text document that I process?