- I am keeping java call-stacks information in Elasticsearch.
- Each callstack element represents one method in the callstack, and is stored in a separate document in Elasticsearch.
- Each method has a unique method_id (unique long value across all callstacks). All method_ids have the same length (same number of digits).
- Each document contains an attribute called 'tree_path' which represents the methods call hierarchy. tree_path is composed of method_ids, separated by "/".
- In the index mapping in Elasticsearch, tree_path is of type string, while method_id is of type long.
For example:
To represent a callstack of the method invocation main(), I insert the following to Elasticsearch:
"timestamp": 1,
"method_id" : 140349263585496,
"method_name" : "main",
"tree_path" : "/140349263585496",
"elapsed_time" : 64
To represent the callstack of the calls main() -> launch() , I insert to Elasticsearch:
"timestamp": 1,
"method_id" : 140351821216632,
"method_name" : "launch",
"tree_path" : "/140349263585496/140351821216632",
"elapsed_time" : 56
Each callstack can appear many times in Elasticsearch (i.e. each unique tree_path can appear many times in the database).
The index mapping in Elasticsearch is as follows:
PUT callstack-test
{
"settings": {
"analysis": {
"analyzer": {
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
},
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
"custom_hierarchy_reversed": {
"type": "path_hierarchy",
"delimiter": "/",
"reverse": "true"
}
}
}
},
"mappings": {
"properties": {
"timestamp": {
"type": "date"
},
"method_id": {
"type": "long"
},
"method_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"elapsed_time": {
"type": "integer"
},
"tree_path": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
},
"keyword": {
"type": "keyword"
}
}
}
}
}
}
The documents are inserted to the database as follows:
POST callstack-test/_doc/1
{
"timestamp": 1,
"method_id" : 140349263585496,
"method_name" : "main",
"tree_path" : "/140349263585496",
"elapsed_time" : 64
}
POST callstack-test/_doc/7
{
"timestamp": 7,
"method_id" : 140349263585496,
"method_name" : "main",
"tree_path" : "/140349263585496",
"elapsed_time" : 83
}
POST callstack-test/_doc/2
{
"timestamp": 1,
"method_id": 140351821216632,
"method_name": "launch",
"tree_path": "/140349263585496/140351821216632",
"elapsed_time": 56
}
POST callstack-test/_doc/3
{
"timestamp": 1,
"method_id": 140351821338528,
"method_name": "launchInternal",
"tree_path": "/140349263585496/140351821216632/140351821338528",
"elapsed_time": 52
}
POST callstack-test/_doc/6
{
"timestamp": 3,
"method_id": 140351821338528,
"method_name": "launchInternal",
"tree_path": "/140349263585496/140351821216632/140351821338528",
"elapsed_time": 40
}
POST callstack-test/_doc/4
{
"timestamp": 1,
"method_id": 140351821338552,
"method_name": "run",
"tree_path": "/140349263585496/140351821216632/140351821338528/140351821338552",
"elapsed_time": 47
}
POST callstack-test/_doc/5
{
"timestamp": 1,
"method_id": 140351821338552,
"method_name": "blah",
"tree_path": "/140349263585496/140351821216632/140351821337777/140351821338552",
"elapsed_time": 21
}
So now we have the following Elasticsearch documents in callstack-test index:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "callstack-test",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"timestamp" : 1,
"method_id" : 140349263585496,
"method_name" : "main",
"tree_path" : "/140349263585496",
"elapsed_time" : 64
}
},
{
"_index" : "callstack-test",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"timestamp" : 1,
"method_id" : 140351821216632,
"method_name" : "launch",
"tree_path" : "/140349263585496/140351821216632",
"elapsed_time" : 56
}
},
{
"_index" : "callstack-test",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"timestamp" : 1,
"method_id" : 140351821338528,
"method_name" : "launchInternal",
"tree_path" : "/140349263585496/140351821216632/140351821338528",
"elapsed_time" : 52
}
},
{
"_index" : "callstack-test",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"timestamp" : 1,
"method_id" : 140351821338552,
"method_name" : "run",
"tree_path" : "/140349263585496/140351821216632/140351821338528/140351821338552",
"elapsed_time" : 47
}
},
{
"_index" : "callstack-test",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"timestamp" : 1,
"method_id" : 140351821338552,
"method_name" : "blah",
"tree_path" : "/140349263585496/140351821216632/140351821337777/140351821338552",
"elapsed_time" : 21
}
},
{
"_index" : "callstack-test",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"timestamp" : 3,
"method_id" : 140351821338528,
"method_name" : "launchInternal",
"tree_path" : "/140349263585496/140351821216632/140351821338528",
"elapsed_time" : 40
}
},
{
"_index" : "callstack-test",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"timestamp" : 7,
"method_id" : 140349263585496,
"method_name" : "main",
"tree_path" : "/140349263585496",
"elapsed_time" : 83
}
}
]
}
}
My question:
I need to query the database in the following way:
Query input:
A list of method_ids
Query output:
Find all the documents whose 'tree_path' contains any of these method_ids, anywhere in the tree_path, and create a bucket for each of these unique tree_paths. Each bucket will aggregate the values of elapsed_time field.
I have the basic query structure, yet I do not know how to add the "contains" part to my basic query.
GET callstack-test/_search
{
"from": 0,
"size": 0,
"track_total_hits": true,
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": 1,
"lte": 6
}
}
}
]
}
},
"aggs": {
"tree_paths_agg": {
"terms": {
"field": "tree_path.keyword",
"size": 20
},
"aggs": {
"self_elapsed_time": {
"sum": {
"field": "elapsed_time"
}
}
}
}
}
}
Note: I am not sure I should have used the analyzers I am currently using in the mapping at all.
Can someone please direct me?
Thank you.