We have a large corpus of JSON-formatted documents to search through to find patterns and historical trends. Elasticsearch seems like the perfect fit for this problem. The first trick is that the documents are collections of tens of thousands of "nested" documents (with a header). The second trick is that these nested documents represent data with varying types.
In order to accommodate this, all the value fields have been "encoded" as an array of strings, so a single integer value has been stored in the JSON as "[\"1\"]", and a table of floats is flattened to "[\"123.45\",\"678.9\",...]" and so on. (We also have arrays of strings, which don't need converting.) While this is awkward, I would have thought this would be a good compromise, given the way everything else involved in Elasticsearch seems to work.
The particular problem here is that these stored data values might represent a bitfield, from which we may need to inspect the state of one bit. Since this field will have been stored as a single-element string array, like "[\"14657\"], we need to convert that to a single integer, and then bit-shift it multiple times to the desired bit (or apply a mask, if such a function is available).
With Elasticsearch, I see that I can embed "Painless" scripts, but examples vary, and I haven't been able to find one that shows how I can covert the arbitrary-length string-array data field to appropriate types, for further comparison. Here's my query script as it stands.
{
"_source" : false,
"from" : 0, "size" : 10,
"query": {
"nested": {
"path": "Variables",
"query": {
"bool": {
"must": {
"match": {"Variables.Designation": "Big_Long_Variable_Name"}
},
"must_not": {
"match": {"Variables.Data": "[0]"}
},
"filter": {
"script": {
"script": {
"source":
"
def vals = doc['Variables.Data'];
return vals[0] != params.setting;
",
"params": {
"setting": 3
}
}
}
}
}
},
"inner_hits": {
"_source": "Variables.Data"
}
}
}
}
I need to somehow transform the vals
variable to an array of ints, pick off the first value, do some bit operations, and make a comparison to return true or false. In this example, I'm hoping to be able to set "setting" equal to the bit position I want to check for on/off.
I've already been through the exercise with Elasticsearch in finding out that I needed to make my Variables.Data field a keyword so I could search on specific values in it. I realize that this is getting away from the intent of Elasticsearch, but I still think this might be the best solution, for other reasons. I created a new index, and reimported my test documents, and the index size went up about 30%. That's a compromise I'm willing to make, if I can get this to work.
What tools do I have in Painless to make this work? (Or, am I crazy to try to do this with this tool?)