1

I'm using this query in order to get which values there are in a single field (SQLfying would be a SELECT field, count(field) GROUP BY field.

In order to do that I'm sending this request to ES:

{
  "query" : {
    "bool" : {
      "must" : {
        "exists" : {
          "field" : "metainfos.ceeaacceaeaaccebeaacceceaaccedeaac"
        }
      }
    }
  },
  "aggregations" : {
    "followUpActivity.metainfo.metainfos.ceeaacceaeaaccebeaacceceaaccedeaac" : {
      "terms" : {
        "field" : "metainfos.ceeaacceaeaaccebeaacceceaaccedeaac",
        "missing" : "null"
      }
    }
  }
}

There's only one document on this collection:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "living_v1",
      "_type" : "fuas",
      "_id" : "a2cb0ba1-8955-11e6-8a00-0242ac110007",
      "_score" : 1.0,
      "_routing" : "user2",
      "_source" : {
        "user" : "user2",
        "timestamp" : "2016-10-03T11:08:30.074Z",
        "startTimestamp" : "2016-10-03T11:08:30.074Z",
        "dueTimestamp" : null,
        "closingTimestamp" : null,
        "matter" : "Fua 1",
        "comment" : null,
        "status" : 0,
        "backlogStatus" : 20,
        "metainfos" : {
          "ceeaacceaeaaccebeaacceceaaccedeaac" : [ "Living Digital" ]
        },
        "resources" : [ ],
        "notes" : null
      }
    } ]
  }
}

As you can see doc.metainfos.ceeaacc... = ["Living Digital"]

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "living_v1",
      "_type" : "fuas",
      "_id" : "a2cb0ba1-8955-11e6-8a00-0242ac110007",
      "_score" : 1.0,
      "_routing" : "user2",
      "_source":{"user":"user2","timestamp":"2016-10-03T11:08:30.074Z","startTimestamp":"2016-10-03T11:08:30.074Z","dueTimestamp":null,"closingTimestamp":null,"matter":"Fua 1","comment":null,"status":0,"backlogStatus":20,"metainfos":{"ceeaacceaeaaccebeaacceceaaccedeaac":["Living Digital"]},"resources":[],"notes":null}
    } ]
  },
  "aggregations" : {
    "followUpActivity.metainfo.metainfos.ceeaacceaeaaccebeaacceceaaccedeaac" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "digital",
        "doc_count" : 1
      }, {
        "key" : "living",
        "doc_count" : 1
      } ]
    }
  }
}

ES is getting me two values: one for "living" and another one for "digital". I'd like to get aggregation using the shole values "Living Digital".

The mapping scheme is:

{
  "living_v1" : {
    "mappings" : {
      "fuas" : {
        "properties" : {
          "backlogStatus" : {
            "type" : "long"
          },
          "comment" : {
            "type" : "string"
          },
          "matter" : {
            "type" : "string"
          },
          "metainfos" : {
            "properties" : {
              "ceeaacceaeaaccebeaacceceaaccedeaac" : {
                "type" : "string"
              }
            }
          },
          "startTimestamp" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "status" : {
            "type" : "long"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "user" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      }
    }
  }
}

As you can see:

"metainfos" : {
    "properties" : {
        "ceeaacceaeaaccebeaacceceaaccedeaac" : {
             "type" : "string"
         }
     }
 }

The problem for me is "ceeaacceaeaaccebeaacceceaaccedeaac" is a user on-demand property created and I don't know how could I set an not-analyzed to any metainfos.* field.

EDIT

I've tested with:

#curl -XPUT 'http://localhost:9200/living_v1/' -d '
{
  "mappings": {
    "fuas": {
      "dynamic_templates": [
        {
          "metainfos": {
            "path_match":   "metainfos.*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      ]
    }
  }
}
'

It's telling me that living_v1 index already exist. As far I've been able to figure out on here I need to send a PUT against index:

{
    "error":{
    "root_cause":[
        {
            "type":"index_already_exists_exception",
            "reason":"already exists",
            "index":"living_v1"
        }
    ],
    "type":"index_already_exists_exception",
    "reason":"already exists",
    "index":"living_v1"
},
"status":400
}
Jordi
  • 20,868
  • 39
  • 149
  • 333
  • I think you are looking for dynamic index templates: http://stackoverflow.com/a/23370138/693546 – mblaettermann Oct 03 '16 at 12:01
  • You can't update the mapping for anything that already has data. You may need to create a new index with the fixed mapping, then reindex your data to this index. Then you could delete the old index and use it's name as a alias for the new index. – kiml42 Apr 25 '19 at 10:23
  • Another option is to add an additional field (https://www.elastic.co/guide/en/elasticsearch/reference/6.4/multi-fields.html) (call it "raw", for instance). Then you can aggregate on "ceeaacceaeaaccebeaacceceaaccedeaac.raw" instead, while preserving the mapping of "ceeaacceaeaaccebeaacceceaaccedeaac". I think this will only affect documents that are indexed after you change the mappings, however. – kiml42 Apr 25 '19 at 10:27

2 Answers2

1

As you already noticed, the search behaviour is caused by the mapping that was applied by default. This mapping does analyzing on all string-valued fields that are not defined differently.

So if you don't yet know which properties (=keys) will be in the metainfos object, you can use the dynamic templates feature as described here and here to define which mapping should be applied for these fields and so override the default behaviour of analyzing a string field.

You could apply a mapping that looks a bit like this (not tested):

{
  "mappings": {
    "fuas": {
      "dynamic_templates": [
        {
          "metainfos": {
            "path_match":   "metainfos.*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "not_analyzed",
            }
          }
        }
      ]
    }
  }
}
Andreas Jägle
  • 11,632
  • 3
  • 31
  • 31
  • Thanks a lot @Andreas. I've tried it, however it has come up some issues. What's the difference between templates and your approach? Are they the same? – Jordi Oct 03 '16 at 12:50
1

As other people have pointed out, dynamic templates is the way to go. The only problem is that you can't change index template after some documents were indexed. You will need to recreate the index (delete index, create mapping, feed new documents)

oldbam
  • 2,397
  • 1
  • 16
  • 24
  • Ok @oldbam, I got it. Is there some straightforward way to recreate the index from `index_v1` to `index_v2`? – Jordi Oct 03 '16 at 14:02
  • 1
    you may consider looking at answers at http://stackoverflow.com/questions/28626803/how-to-rename-an-index-in-a-cluster . I always deleted an index and started feeding documents again when I was changing index template – oldbam Oct 04 '16 at 17:08