I am working on a project where I heavily use Elasticsearch and leverage the moreLikeThis query to implement some features. The official documentation for the MLT query states the following:
In order to speed up analysis, it could help to store term vectors at index time, but at the expense of disk usage.
In the **How it works* section. The idea now is then to tune the mapping so store the pre calculated term vectors. The problem is that it seems unclear from the documentation how exactly this should be done. On one side, in the MLT documentation, they provide and example mapping that looks like this:
curl -s -XPUT 'http://localhost:9200/imdb/' -d '{
"mappings": {
"movies": {
"properties": {
"title": {
"type": "string",
"term_vector": "yes"
},
"description": {
"type": "string"
},
"tags": {
"type": "string",
"fields" : {
"raw": {
"type" : "string",
"index" : "not_analyzed",
"term_vector" : "yes"
}
}
}
}
}
}
}
On the other side, in the Term Vectors documentation, they provide a mapping in the Example 1 section that looks like this
curl -s -XPUT 'http://localhost:9200/twitter/' -d '{
"mappings": {
"tweet": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"store" : true,
"index_analyzer" : "fulltext_analyzer"
},
"fullname": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"index_analyzer" : "fulltext_analyzer"
}
}
}
....
This should create an index that stores term vectors, payloads etc.
Now the question is: which of the mapping should be used? Is it a flaw in the documentation or am I missing something?