Filter Elasticsearch documents by a sub-value

Question

I am keeping java call-stacks information in Elasticsearch.
Each callstack element represents one method in the callstack, and is stored in a separate document in Elasticsearch.
Each method has a unique method_id (unique long value across all callstacks). All method_ids have the same length (same number of digits).
Each document contains an attribute called 'tree_path' which represents the methods call hierarchy. tree_path is composed of method_ids, separated by "/".
In the index mapping in Elasticsearch, tree_path is of type string, while method_id is of type long.

For example:

To represent a callstack of the method invocation main(), I insert the following to Elasticsearch:

  "timestamp": 1,
  "method_id" : 140349263585496,
  "method_name" : "main",
  "tree_path" : "/140349263585496",
  "elapsed_time" : 64

To represent the callstack of the calls main() -> launch() , I insert to Elasticsearch:

  "timestamp": 1,
  "method_id" : 140351821216632,
  "method_name" : "launch",
  "tree_path" : "/140349263585496/140351821216632",
  "elapsed_time" : 56

Each callstack can appear many times in Elasticsearch (i.e. each unique tree_path can appear many times in the database).

The index mapping in Elasticsearch is as follows:

PUT callstack-test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_path_tree": {
          "tokenizer": "custom_hierarchy"
        },
        "custom_path_tree_reversed": {
          "tokenizer": "custom_hierarchy_reversed"
        }
      },
      "tokenizer": {
        "custom_hierarchy": {
          "type": "path_hierarchy",
          "delimiter": "/"
        },
        "custom_hierarchy_reversed": {
          "type": "path_hierarchy",
          "delimiter": "/",
          "reverse": "true"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date"
      },
      "method_id": {
        "type": "long"
      },
      "method_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "elapsed_time": {
        "type": "integer"
      },
      "tree_path": {
        "type": "text",
        "fields": {
          "tree": {
            "type": "text",
            "analyzer": "custom_path_tree"
          },
          "tree_reversed": {
            "type": "text",
            "analyzer": "custom_path_tree_reversed"
          },
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

The documents are inserted to the database as follows:

POST callstack-test/_doc/1
{
  "timestamp": 1,
  "method_id" : 140349263585496,
  "method_name" : "main",
  "tree_path" : "/140349263585496",
  "elapsed_time" : 64
}

POST callstack-test/_doc/7
{
  "timestamp": 7,
  "method_id" : 140349263585496,
  "method_name" : "main",
  "tree_path" : "/140349263585496",
  "elapsed_time" : 83
}

POST callstack-test/_doc/2
{
  "timestamp": 1,
  "method_id": 140351821216632,
  "method_name": "launch",
  "tree_path": "/140349263585496/140351821216632",
  "elapsed_time": 56
}

POST callstack-test/_doc/3
{
  "timestamp": 1,
  "method_id": 140351821338528,
  "method_name": "launchInternal",
  "tree_path": "/140349263585496/140351821216632/140351821338528",
  "elapsed_time": 52
}

POST callstack-test/_doc/6
{
  "timestamp": 3,
  "method_id": 140351821338528,
  "method_name": "launchInternal",
  "tree_path": "/140349263585496/140351821216632/140351821338528",
  "elapsed_time": 40
}

POST callstack-test/_doc/4
{
  "timestamp": 1,
  "method_id": 140351821338552,
  "method_name": "run",
  "tree_path": "/140349263585496/140351821216632/140351821338528/140351821338552",
  "elapsed_time": 47
}

POST callstack-test/_doc/5
{
  "timestamp": 1,
  "method_id": 140351821338552,
  "method_name": "blah",
  "tree_path": "/140349263585496/140351821216632/140351821337777/140351821338552",
  "elapsed_time": 21
}

So now we have the following Elasticsearch documents in callstack-test index:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "callstack-test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1,
          "method_id" : 140349263585496,
          "method_name" : "main",
          "tree_path" : "/140349263585496",
          "elapsed_time" : 64
        }
      },
      {
        "_index" : "callstack-test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1,
          "method_id" : 140351821216632,
          "method_name" : "launch",
          "tree_path" : "/140349263585496/140351821216632",
          "elapsed_time" : 56
        }
      },
      {
        "_index" : "callstack-test",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1,
          "method_id" : 140351821338528,
          "method_name" : "launchInternal",
          "tree_path" : "/140349263585496/140351821216632/140351821338528",
          "elapsed_time" : 52
        }
      },
      {
        "_index" : "callstack-test",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1,
          "method_id" : 140351821338552,
          "method_name" : "run",
          "tree_path" : "/140349263585496/140351821216632/140351821338528/140351821338552",
          "elapsed_time" : 47
        }
      },
      {
        "_index" : "callstack-test",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1,
          "method_id" : 140351821338552,
          "method_name" : "blah",
          "tree_path" : "/140349263585496/140351821216632/140351821337777/140351821338552",
          "elapsed_time" : 21
        }
      },
      {
        "_index" : "callstack-test",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 3,
          "method_id" : 140351821338528,
          "method_name" : "launchInternal",
          "tree_path" : "/140349263585496/140351821216632/140351821338528",
          "elapsed_time" : 40
        }
      },
      {
        "_index" : "callstack-test",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 7,
          "method_id" : 140349263585496,
          "method_name" : "main",
          "tree_path" : "/140349263585496",
          "elapsed_time" : 83
        }
      }
    ]
  }
}

My question:

I need to query the database in the following way:

Query input:

A list of method_ids

Query output:

Find all the documents whose 'tree_path' contains any of these method_ids, anywhere in the tree_path, and create a bucket for each of these unique tree_paths. Each bucket will aggregate the values of elapsed_time field.

I have the basic query structure, yet I do not know how to add the "contains" part to my basic query.

GET callstack-test/_search
{
  "from": 0,
  "size": 0,
  "track_total_hits": true,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gte": 1,
              "lte": 6
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "tree_paths_agg": {
      "terms": {
        "field": "tree_path.keyword",
        "size": 20
      },
      "aggs": {
        "self_elapsed_time": {
          "sum": {
            "field": "elapsed_time"
          }
        }
      }
    }
  }
}

Note: I am not sure I should have used the analyzers I am currently using in the mapping at all.

Can someone please direct me?

Thank you.

Filter Elasticsearch documents by a sub-value

My question:

0 Answers0