Elasticsearch: How to store term vectors

Question

I am working on a project where I heavily use Elasticsearch and leverage the moreLikeThis query to implement some features. The official documentation for the MLT query states the following:

In order to speed up analysis, it could help to store term vectors at index time, but at the expense of disk usage.

In the **How it works* section. The idea now is then to tune the mapping so store the pre calculated term vectors. The problem is that it seems unclear from the documentation how exactly this should be done. On one side, in the MLT documentation, they provide and example mapping that looks like this:

curl -s -XPUT 'http://localhost:9200/imdb/' -d '{
  "mappings": {
    "movies": {
      "properties": {
        "title": {
          "type": "string",
          "term_vector": "yes"
         },
         "description": {
          "type": "string"
        },
        "tags": {
          "type": "string",
          "fields" : {
            "raw": {
              "type" : "string",
              "index" : "not_analyzed",
              "term_vector" : "yes"
            }
          }
        }
      }
    }
  }
}

On the other side, in the Term Vectors documentation, they provide a mapping in the Example 1 section that looks like this

curl -s -XPUT 'http://localhost:9200/twitter/' -d '{
  "mappings": {
    "tweet": {
      "properties": {
        "text": {
          "type": "string",
          "term_vector": "with_positions_offsets_payloads",
          "store" : true,
          "index_analyzer" : "fulltext_analyzer"
         },
         "fullname": {
          "type": "string",
          "term_vector": "with_positions_offsets_payloads",
          "index_analyzer" : "fulltext_analyzer"
        }
      }
    }
    ....

This should create an index that stores term vectors, payloads etc.

Now the question is: which of the mapping should be used? Is it a flaw in the documentation or am I missing something?

second example just store extra information as well.i guess it should be enough for you, to just use "yes" — Mysterion, Aug 28 '15 at 11:55
but is this sort of behaviour somehow documented somewhere? like that "yes" does something and "with_positions_offsets_payloads" does more? — Nicola Miotto, Aug 28 '15 at 13:39

score 10 · Accepted Answer · edited Nov 10 '16 at 15:15

You are right it doesn't seem to be explicitly mentioned in the current version of documents however in the upcoming release 2.0 documents there is a more detailed explanation.

Term vectors contain information about the terms produced by the analysis process, including:

a list of terms.

the position (or order) of each term.

the start and end character offsets mapping the term to its origin in the original string.

These term vectors can be stored so that they can be retrieved for a particular document.

The term_vector setting accepts:

no: No term vectors are stored. (default)

yes: Just the terms in the field are stored

with_positions: Terms and positions are stored

with_offsets: Terms and character offsets are stored

with_positions_offsets: Terms, positions, and character offsets are stored

wonderful, thanks :) I just hope the 2.0 doc applies to the previous versions too, but from the few tests I did, looks like it does — Nicola Miotto, Aug 28 '15 at 14:36

Elasticsearch: How to store term vectors

1 Answers1