1

I have setup snowplow with Elasticsearch.

When I want to get the data out I just do normal queries and use aggregates to get them by day, country etc.

So I want to figure out clickthru rate for these aggregations, I have 2 kind of events: page views and clicks.

Currently I do 2 queries:

Page Views:

{
    "size": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "event": "page_view"
                            }
                        }
                    ],
                    "must_not": {
                        "term": {
                            "br_family": "Robot"
                        }
                    }
                }
            }
        }
    },
    "aggs": {
        "dates": {
            "date_histogram": {
                "field": "collector_tstamp",
                "interval": "day"
            }
        }
    }
}

Clicks:

{
    "size": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "event": "struct"
                            }
                        },
                        {
                            "term": {
                                "se_action": "click"
                            }
                        }
                    ],
                    "must_not": {
                        "term": {
                            "br_family": "Robot"
                        }
                    }
                }
            }
        }
    },
    "aggs": {
        "dates": {
            "date_histogram": {
                "field": "collector_tstamp",
                "interval": "day"
            }
        }
    }
}

I format the response to something easier to use and then merge them in PHP using something like this.

function merge_metrics($pv,$c){
    $r = array();

    if(count($pv) > 0){
        foreach ($pv as $key => $value) {
            $r[$value['name']]['page_views'] += $value['count']; 
        }
    }
    if(count($c) > 0){
        foreach ($c as $key => $value) {
            $r[$value['name']]['clicks'] += $value['count']; 
        }
    }

    $rf = array();

    foreach ($r as $key => $value) {
        $tmp_clicks = isset($value['clicks']) ? $value['clicks'] : 0;
        $tmp_page_views = isset($value['page_views']) ? isset($value['page_views']) : 0;
        $rf[] = array(
                'name' => $key,
                'page_views' => $tmp_page_views,
                'clicks' => $tmp_clicks,
                'ctr' => ctr($tmp_clicks,$tmp_page_views)
            ); 
    }

    return $rf;
}

Both $pv and $c are arrays that contain the aggregates that result from querying Elasticsearch and I do some formatting for ease of use.

My question is:

Is it possible get multiple metrics(in my case page views and clicks, these are specific filters) and perform same aggregations on both ? then returning the aggregations something like :

{
    "data": [
        {
            "day": "2015-10-13",
            "page_views": 61,
            "clicks": 0,
        },
        {
            "day": "2015-10-14",
            "page_views": 135,
            "clicks": 1,
        },
        {
            "day": "2015-10-15",
            "page_views": 39,
            "clicks": 0,
        }
    ]
}

But without me having to manually merge them ?

XkiD
  • 159
  • 1
  • 14

1 Answers1

2

Yes, it is definitely possible if you merge your aggregations into one single query. For instance, I suppose you have one query like this for page views:

{
    "query": {...}
    "aggregations": {
        "by_day": {
            "date_histogram": {
                "field": "day",
                "interval": "day"
            },
            "aggs": {
                "page_views_per_day": {
                    "sum": {
                        "field": "page_views"
                    }
                }
            }
        }
    }
}

And another query like this for clicks:

{
    "query": {...}
    "aggregations": {
        "by_day": {
            "date_histogram": {
                "field": "day",
                "interval": "day"
            },
            "aggs": {
                "clicks_per_day": {
                    "sum": {
                        "field": "clicks"
                    }
                }
            }
        }
    }
}

Provided you have the same constraints in your query, you can definitely merge them together at the date_histogram level, like this:

{
    "query": {...}
    "aggregations": {
        "by_day": {
            "date_histogram": {
                "field": "day",
                "interval": "day"
            },
            "aggs": {
                "page_views_per_day": {
                    "sum": {
                        "field": "page_views"
                    }
                },
                "clicks_per_day": {
                    "sum": {
                        "field": "clicks"
                    }
                }
            }
        }
    }
}

UPDATE

Since your queries are different for each of your aggregations, we need to do it slightly differently, i.e. by using an additional filters aggregation, like this:

{
  "size": 0,
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "event": [
                  "page_view",
                  "struct"
                ]
              }
            }
          ],
          "should": {
            "term": {
              "se_action": "click"
            }
          },
          "must_not": {
            "term": {
              "br_family": "Robot"
            }
          }
        }
      }
    }
  },
  "aggs": {
    "dates": {
      "date_histogram": {
        "field": "collector_tstamp",
        "interval": "day"
      },
      "aggs": {
        "my_filters": {
          "filters": {
            "filters": {
              "page_views_filter": {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "event": "page_view"
                      }
                    }
                  ],
                  "must_not": {
                    "term": {
                      "br_family": "Robot"
                    }
                  }
                }
              },
              "clicks_filter": {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "event": "struct"
                      }
                    },
                    {
                      "term": {
                        "se_action": "click"
                      }
                    }
                  ],
                  "must_not": {
                    "term": {
                      "br_family": "Robot"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Now for each daily bucket, you're going to end up with two sub-buckets, one for the count of page views and another for the count of clicks.

Val
  • 207,596
  • 13
  • 358
  • 360
  • Hello @Val, thank you for your response, I just updated the question to include my queries. – XkiD Oct 15 '15 at 11:32
  • I am guessing the merged query would now look different with my actual queries ? The difference is that I filter for those counts they are not in a field. – XkiD Oct 15 '15 at 12:07
  • Would it make sense still include the filters for the page views and clicks in the filtered query with SHOULD? so the aggregate function just uses the filtered data, this is for performance in case I have a very big database. – XkiD Oct 15 '15 at 12:20
  • 1
    Yes, you can definitely do that of course if you have a large amount of data. I've modified my answer again to reflect that. – Val Oct 15 '15 at 12:21
  • For some strange reason.... ""aggs": { "filters": { "filters": {" does not simply work ... it needs to be like this "aggs": { "whatever_you_want_here": { "filters": { "filters": { – XkiD Oct 16 '15 at 12:53
  • Do you happen to know why filters has to contain another array filters ? it's very strange to me... – XkiD Oct 16 '15 at 12:55
  • True, I've missed the aggregation name. I've updated my answer accordingly. Should work now. See [my answer to another question](http://stackoverflow.com/questions/32489706/need-explanation-on-elasticsearch-filters-aggregation/32492198#32492198) for the reasons why this is so. – Val Oct 16 '15 at 12:57