How to distinct rows from list of lists using Painless scripting language?

Question

I have a Groovy script:

def results = []
def cluster = ['cluster1', 'cluster1', 'cluster1', 'cluster1', 'cluster1', 'cluster1'];
def ports =  ['4344', '4344', '4344', '4344', '4344', '4344'];
def hostname = [ 'cluster1.com','cluster1.com','cluster1.com','cluster1.com','cluster1.com','cluster1.com' ];

def heapu = ['533.6', '526.72' , '518.82' , '515.73', '525.69', '517.71'] ;
def heapm = ['1212.15', '1212.15', '1212.15', '1212.15', '1212.15', '1212.15'];
def times = ['2017-10-08T07:26:21.050Z', '2017-10-08T07:26:11.042Z', '2017-10-08T07:25:51.047Z', '2017-10-08T07:25:31.055Z', '2017-10-08T07:26:01.047Z', '2017-10-08T07:25:41.041Z'] ;

for (int i = 0; i < cluster.size(); ++i){
    def c = cluster[i]
    def p = ports[i]
    def h = hostname[i]
    def hu = heapu[i]
    def hm = heapm[i]
    def t = times[i]

    results.add(['cluster': c,
                 'port': p,
                 'hostname': h,
                 'heap_used': hu,
                 'heap_max': hm,
                 'times': t])
    results = results.unique()
}
//    return ['results': results, 'singlex': singlex]

for (i = 0; i < results.size(); i++){
    println(results[i])
}

The output of this script looks like:

[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:533.6, heap_max:1212.15, times:2017-10-08T07:26:21.050Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:526.72, heap_max:1212.15, times:2017-10-08T07:26:11.042Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:518.82, heap_max:1212.15, times:2017-10-08T07:25:51.047Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:515.73, heap_max:1212.15, times:2017-10-08T07:25:31.055Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:525.69, heap_max:1212.15, times:2017-10-08T07:26:01.047Z]
[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:517.71, heap_max:1212.15, times:2017-10-08T07:25:41.041Z]

As it can bee seen from output - > I basically have 6 same line which differs with timestamp. HeapSize and Max HeapSize is different but that is not that important.

Since cluster is the same for all the six entries /cluster1/ I consider it as one output. Ideally, I would like to apply some sort of unique() function which would provide me one line as an output

like following:

[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:523.0450, heap_max:1212.15, times:2017-10-08T07:25:41.041Z]

where heap_used is an average of 6 values as well as heap_max. I know that in python pandas I can make it with one command.However I have no idea about groovy, I keep searching on internet.

EDIT: Groovy solution does not transfer 1:1 to Painless unfortunately.

score 2 · Answer 1 · answered Oct 08 '17 at 19:40

2

You can process your results list in a following way:

def grouped = results.groupBy { [it.cluster, it.port, it.hostname] }
        .entrySet()
        .collect { it -> [cluster: it.key.get(0), port: it.key.get(1), hostname: it.key.get(2)] + [
                heap_used: it.value.heap_used*.toBigDecimal().sum() / it.value.size(),
                heap_max: it.value.heap_max*.toBigDecimal().sum() / it.value.size(),
                times: it.value.times.max()
        ]}

Firstly we group all list elements by triplet containing cluster, port and hostname. Then we collect all entries by combining cluster, port and hostname with heap_used: avg(heap_used), heap_max: avg(heap_max) and times: max(times).

Here

it.value.heap_used*.toBigDecimal().sum()

we take a list of all heap_used values (it.value.heap_used) and then we use spread operator to apply .toBigDecimal() on each list element, because your initial values are represented as strings. And to calculate average we just divide a sum of all heap_used values by the size of the list.

Output

Printing grouped variable will display following result:

[[cluster:cluster1, port:4344, hostname:cluster1.com, heap_used:523.045, heap_max:1212.15, times:2017-10-08T07:26:21.050Z]]

answered Oct 08 '17 at 19:40

Szymon Stepniak

40,216
10
104
131

Hi @Szymon Stepniak, first of all let me thank you so much for your answer. I evaluated it in IntelliJ IDEA (using groovy 2.4 and java 9) the only provlem is that `port: null` - that might be because of JAVA version. **The problem is** that this solution should be implemented in **elasticsearch watcher** which is using painless language - which should be pretty much **groovy**. Apparently, It does not work :(. I am spending so much time on it. – user2156115 Oct 09 '17 at 09:47
"type" : "script_exception", "reason" : "compile error", "script_stack" : [ "... rouped = results.groupBy { [it.cluster, it.port, ...", " ^---- HERE" – user2156115 Oct 09 '17 at 09:54
"lang" : "painless", "caused_by" : { "type" : "illegal_argument_exception", "reason" : "unexpected token ['{'] was expecting one of [{, ';'}]." } – user2156115 Oct 09 '17 at 09:54
@user2156115 Regarding `port` thing - check if it's `port` or `post`. In your question it was named as `post` and I thought it was a typo. Maybe you store it in elasticsearch as `post` so `port` cannot be found. – Szymon Stepniak Oct 09 '17 at 10:04
yes that was typo for sure thanks. Can you maybe advise on **painless** part of question cause that is basically the main idea of this question or **is there maybe some let's say simpler way like using for loop to get the same result** to be honest I am from the python world and I am having tough time to understand your code completely - the fragments of your solution make sense to me. – user2156115 Oct 09 '17 at 10:11
@user2156115 I have no idea what Painless is, never used it. In your question you have asked for a Groovy solution and the one I've shown you is a pure Groovy script. According to https://www.elastic.co/blog/painless-a-new-scripting-language it looks like Painless is something different than Groovy, but it has a syntax similar to Groovy. I'm adding a [tag: elasticsearch-painless] tag to your question, hope someone experienced with that scripting language will help you. – Szymon Stepniak Oct 09 '17 at 10:48
yes you absolutely right, it is something different and similar they claim. Anyways, **your answer was absolutely correct** when it comes to **groovy** Thank you so much. Much appreciate. – user2156115 Oct 09 '17 at 11:02

How to distinct rows from list of lists using Painless scripting language?

1 Answers1

Output