Out of the box the nutch index writer for elasticsearch generates an index in elasticsearch with the name provided in nutch-site.xml (or nutch-default.xml) in the property element:
<property>
<name>elastic.index</name>
<value>nutch</value>
<description>Default index to send documents to.</description>
</property>
The mappings section in elasticsearch for such an automatically generated index always has the following structure
{
"nutch": {
"mappings": {
"doc": {
"properties": {
"anchor": {
"type": "string"
},
"boost": {
"type": "string"
},
"cache": {
"type": "string"
},
"content": {
"type": "string"
},
"contentLength": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"digest": {
"type": "string"
},
"host": {
"type": "string"
},
"id": {
"type": "string"
},
"lang": {
"type": "string"
},
"lastModified": {
"type": "date",
"format": "dateOptionalTime"
},
"segment": {
"type": "string"
},
"title": {
"type": "string"
},
"tstamp": {
"type": "date",
"format": "dateOptionalTime"
},
"type": {
"type": "string"
},
"url": {
"type": "string"
}
}
}
}
}
}
- Where is the template for this?
- Can it be changed?
- If yes, which fields are mandatory and which are optional?
- Where can I find more information on this?
Any help appreciated! Thanks, Wolfram