3

I would like to specify the date format dd/MM/yyyy for a field of type date. I know the following methods:

  1. edit schema.xml and add the datetimeformat="dd/MM/yyyy" attribute to the <field /> tag involved, but I haven't tested it. Or,
  2. edit solrconfig.xml and add a <str>dd/MM/yyyy</str> tag to the processor of class solr.ParseDateFieldUpdateProcessorFactory. I'm sure this works because I've personally tested it.

I would like to use the managed schema and the Schema API instead of editing schema.xml. This is handy and useful both in standalone and Cloud Solr.

In order to add a date field, I do as follows:

curl http://localhost:8983/solr/test/schema -X POST -H 'Content-type:application/json' --data-binary '
{   
  "add-field":
  {
    "name":"mydate",     
    "type":"date",
    "stored":true, 
    "indexed":true
  }
}'

and to edit some field properties, like the stored property, I do:

curl -X POST -H 'Content-type:application/json' --data-binary '
{
  "replace-field":
  {
    "name":"mydate",
    "stored":false
  }
}' http://localhost:8983/solr/test/schema

If I try to set "datetimeformat":"dd/MM/yyyy" during the creation or the edit of the fields, I get an error.

Is it possible to edit the date format using only the Schema API without editing any *.xml file?

UPDATE

I tried this command without any success:

curl http://localhost:8983/solr/test/config -H 'Content-type:application/json' -d '
{
  "update-updateprocessor" : 
  {
    "class": "solr.ParseDateFieldUpdateProcessorFactory", 
    "name":"solr.ParseDateFieldUpdateProcessorFactory",
    "format":["dd/MM/yyyy"]
  }
}'

The problem is that the original definition of solr.ParseDateFieldUpdateProcessorFactory in solrconfig.xml is:

<processor class="solr.ParseDateFieldUpdateProcessorFactory">
  <arr name="format">
    <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
    <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
    <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
    <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
    <str>yyyy-MM-dd'T'HH:mm:ssZ</str>
    <str>yyyy-MM-dd'T'HH:mm:ss</str>
    <str>yyyy-MM-dd'T'HH:mmZ</str>
    <str>yyyy-MM-dd'T'HH:mm</str>
    <str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
    <str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
    <str>yyyy-MM-dd HH:mm:ss.SSS</str>
    <str>yyyy-MM-dd HH:mm:ss,SSS</str>
    <str>yyyy-MM-dd HH:mm:ssZ</str>
    <str>yyyy-MM-dd HH:mm:ss</str>
    <str>yyyy-MM-dd HH:mmZ</str>
    <str>yyyy-MM-dd HH:mm</str>
    <str>yyyy-MM-dd</str>
  </arr>
</processor>

and it doesn't have a name attribute. If I omitt "name" attribute in the JSON request, Solr throws the error 'name' is a required field. I tried various combinations but none worked: "name":"solr.ParseDateFieldUpdateProcessorFactory", "name":"ParseDateFieldUpdateProcessorFactory", "name":"".

UPDATE 2

Running curl http://localhost:8983/solr/test/config returns a JSON object. Here's a portion of it:

{
...
    "updateRequestProcessorChain":[{
    "name":"add-unknown-fields-to-the-schema",
    "":[{"class":"solr.UUIDUpdateProcessorFactory"},
      {"class":"solr.LogUpdateProcessorFactory"},
      {"class":"solr.DistributedUpdateProcessorFactory"},
      {"class":"solr.RemoveBlankFieldUpdateProcessorFactory"},
      {
        "class":"solr.FieldNameMutatingUpdateProcessorFactory",
        "pattern":"[^\\w-\\.]",
        "replacement":"_"},
      {"class":"solr.ParseBooleanFieldUpdateProcessorFactory"},
      {"class":"solr.ParseLongFieldUpdateProcessorFactory"},
      {"class":"solr.ParseDoubleFieldUpdateProcessorFactory"},
      {"class":"solr.ParseDateFieldUpdateProcessorFactory"},
      {"class":"solr.AddSchemaFieldsUpdateProcessorFactory"},
      {"class":"solr.RunUpdateProcessorFactory"}]}],
...
}

This means that solr.ParseDateFieldUpdateProcessorFactory is a type of updateRequestProcessorChain. The documentation states:

The Config API does not let you create or edit <updateRequestProcessorChain> elements. However, it is possible to create <updateProcessor> entries and can use them by name to create a chain.

This means that it's not possible to add a specific date format to the existing solr.ParseDateFieldUpdateProcessorFactory using Config API. I should create a custom update processor that does what I want, and so use the add-updateprocessor API with proper parameters.

pietrop
  • 1,071
  • 2
  • 10
  • 27

2 Answers2

2

After struggling on the horrific Solr documentation, I found a solution. The documentation states:

The Config API does not let you create or edit <updateRequestProcessorChain> elements. However, it is possible to create <updateProcessor> entries and can use them by name to create a chain.

[ ... ]

You can use this directly in your request by adding a parameter in the <updateRequestProcessorChain> for the specific update processor called processor=firstFld.

This means that I have to add a custom update processor and invoke it explicitly when using the /update handler. So:

curl http://localhost:8983/solr/test/config -H 'Content-type:application/json' -d '
{
  "add-updateprocessor" : 
  {
    "name" : "myCustomDateUpdateProcessor", 
    "class": "solr.ParseDateFieldUpdateProcessorFactory", 
    "format":["dd/MM/yyyy"]
  }
}'

To load the data into test collection via the /update/csv handler, use this command:

curl http://localhost:8983/solr/test/update/csv?processor=myCustomDateUpdateProcessor&commit=true --data-binary @file.csv -H 'Content-type:text/plain; charset=utf-8'

Note the presence of processor=myCustomDateUpdateProcessor, where myCustomDateUpdateProcessor is the update processor I created before. The processor is stored in configoverlay.json and not in solrconfig.xml.

pietrop
  • 1,071
  • 2
  • 10
  • 27
0

You have a bunch of things confused here:

  1. datetimeformat is for DataImportHandler mapping definition. There is no datetimeformat in schema file
  2. If you are using managed schema, you don't actually have schema.xml, you have managed-schema file
  3. If you are using recent Solr, you have Config API to modify solrconfig.xml. Actually, differently from managed-schema, you modify a separate overlay.json file, but the end-result is the same.

So, you have to set the format in the UpdateRequestProcessor, but you can manage that via API (in the recent Solr).

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27
  • Thank you for the clarifications. My question was exactly about how to set the date format using API. Documentation lacks on that topic, or may be I didn't search enough.. – pietrop Sep 29 '16 at 07:58
  • BTW, I'll follow the link you posted to figure out how to correctly invoke API. – pietrop Sep 29 '16 at 09:14
  • I updated my question with what I came up until now. The documentation is very criptic and incomplete IMHO. I come from PostgreSQL, which has a far more light, clear, consistent, complete, organic and well structured documentation. I tried using the `update-updateprocessor` request but probably it's meant for custom update processors and not for factory ones. So I still can't find a way to add `dd/MM/yyyy` to `solrconfig.xml`. Thank you for your time and help – pietrop Sep 29 '16 at 11:04