0

I have a two field sub aggregations: `['field1', 'field2']. Both fields are term aggregations. The way elasticsearch returns aggregations isn't very convenient with all those buckets and nesting and bucket nesting. I am having troubles on transforming elasticsearch results to list of dicts e.g

elasticsearch fake results:

'aggregations':{
    'field1':{
        'buckets':[
            {
                'key':'value1',
                'field2':{
                    'buckets':[
                        {
                            'key':'1.1.1.1',
                            'doc_count':15
                        },
                        {
                            'key': '2.2.2.2',
                            'doc_count': 12
                        }

                    ]

                }
            },
            {
                'key': 'value2',
                'field2': {
                    'buckets': [
                        {
                            'key': '3.3.3.3',
                            'doc_count': 15
                        },
                        {
                            'key': '4.4.4.4',
                            'doc_count': 12
                        }
                     ]
                 }

            },
            {
                'key': 'value3',
                'field2': {
                    'buckets': [
                        {
                            'key': '5.5.5.5',
                            'doc_count': 15
                        },
                        {
                            'key': '6.6.6.6',
                            'doc_count': 12
                        }
                     ]
                 }
            }
        ]
    }
}

I would like the result to be in the form of this:

[{'field1':'value1', 'field2':'1.1.1.1'}, 
 {'field1':'value1', 'field2':'2.2.2.2'},
 {'field1':'value2', 'field2':'3.3.3.3'},
 {'field1':'value2', 'field2':'4.4.4.4'},
 {'field1':'value3', 'field2':'5.5.5.5'},
 {'field1':'value3', 'field2':'6.6.6.6'} ]

like a normal database with rows and columns. The aggregation name must be the column name this is necessary. I have thought of using some tree representation of the data and then after creating the tree data structure with dfs create each row of the results. But need a place to start.

Apostolos
  • 7,763
  • 17
  • 80
  • 150
  • Why not parsing the buckets on the client side? 2-3 lines of Python would take care of transforming your result into the desired structure. – Val Nov 22 '16 at 09:38
  • @Val I am talking about client side. It's not a very easy problem in my opinion. Kibana does it in the frontend two using a similar approach with linked lists and traversing the buckets and metrics – Apostolos Nov 23 '16 at 11:09

1 Answers1

0

If you load that JSON aggregation results into a dictionary (json.loads('{...}')), you can then iterate over it very simply in 3 lines of code:

fields = []
for bucket in agg['aggregations']['field1']['buckets']:
    for sub in bucket['field2']['buckets']:
        fields.append({'field1': bucket['key'], 'field2': sub['key']})

After running this, the field array will contain exactly what you need, i.e. (The JSON below has been obtained with json.dumps(fields))

[
  {
    "field2": "1.1.1.1",
    "field1": "value1"
  },
  {
    "field2": "2.2.2.2",
    "field1": "value1"
  },
  {
    "field2": "3.3.3.3",
    "field1": "value2"
  },
  {
    "field2": "4.4.4.4",
    "field1": "value2"
  },
  {
    "field2": "5.5.5.5",
    "field1": "value3"
  },
  {
    "field2": "6.6.6.6",
    "field1": "value3"
  }
]
Val
  • 207,596
  • 13
  • 358
  • 360
  • First thanks for taking the time and answering. The problem is (which I should have stated) that the number of buckets nested to each other is known at runtime and differs. So yes your approach works but field1 field2 or field3 etc aren't known from the begining. So need to find a way for client to know how deep is the "rabbit hole" – Apostolos Nov 28 '16 at 11:27
  • You can do it the same way, by iterating over the keys if they are not known in advance. I took field1, field2, etc to show the idea, but even if you don't know them you can get the dictionary keys and iterate over them. – Val Nov 28 '16 at 11:36