0

I've a dataset with say 1000 dates spanning over a period of 1 month. I'd like to do an aggregation based on this date field but on only few samples separated by an interval(say week).

For ex: For dates ranging from 1-dec to 30-dec, i should get buckets for dates: 1 dec,8 dec, 15 dec, 22 dec & 29 dec. PS : I don't want to use date histograms here as it groups the data into the given interval. So for the example above, it forms bucket from 1-7, 8-15 and so on.

I looked up at the sampler aggregation and it requires a script to be provided. I couldn't figure out how the script should be written in a way to pick up samples and supply those samples to the child aggregation.

Jai Sharma
  • 713
  • 1
  • 4
  • 17
  • Why not simply constraining a `date_histogram` aggregation by a filter which filters out any other dates than the one you want to sample? – Val Feb 01 '17 at 04:51
  • As i told, the date histogram is going to **group** rather than **filtering**. Can you prove your point by an example? – Jai Sharma Feb 01 '17 at 09:18

1 Answers1

0

There are different ways to do this. One of them is to use a date_histogram aggregation constrained by a filter that will only select the desired dates:

{
  "aggs": {
    "5_days": {
      "filter": {
        "filter": {
          "bool": {
            "minimum_should_match": 1,
            "should": [
              {
                "range": {
                  "date": {
                    "from": "2016-12-01T00:00:00.000Z",
                    "to": "2016-12-02T00:00:00.000Z"
                  }
                }
              },
              {
                "range": {
                  "date": {
                    "from": "2016-12-08T00:00:00.000Z",
                    "to": "2016-12-09T00:00:00.000Z"
                  }
                }
              },
              {
                "range": {
                  "date": {
                    "from": "2016-12-15T00:00:00.000Z",
                    "to": "2016-12-16T00:00:00.000Z"
                  }
                }
              },
              {
                "range": {
                  "date": {
                    "from": "2016-12-22T00:00:00.000Z",
                    "to": "2016-12-23T00:00:00.000Z"
                  }
                }
              },
              {
                "range": {
                  "date": {
                    "from": "2016-12-29T00:00:00.000Z",
                    "to": "2016-12-30T00:00:00.000Z"
                  }
                }
              }
            ]
          }
        }
      },
      "aggs": {
        "samples": {
          "date_histogram": {
            "field": "date",
            "interval": "day"
          }
        }
      }
    }
  }
}

The second way is more concise and boils down to using a date_range aggregation with only the selected dates:

{
    "aggs": {
        "range": {
            "date_range": {
                "field": "date",
                "ranges": [
                    { "from": "2016-12-01T00:00:00.000Z", "to": "2016-12-02T00:00:00.000Z" }, 
                    { "from": "2016-12-08T00:00:00.000Z", "to": "2016-12-09T00:00:00.000Z" }, 
                    { "from": "2016-12-15T00:00:00.000Z", "to": "2016-12-16T00:00:00.000Z" }, 
                    { "from": "2016-12-22T00:00:00.000Z", "to": "2016-12-23T00:00:00.000Z" }, 
                    { "from": "2016-12-29T00:00:00.000Z", "to": "2016-12-30T00:00:00.000Z" } 
                ]
            }
        }
    }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • The first option looks good and it should work. Just a quick question, why did you use **interval = daily** and not **weekly** ? Would it make a difference ? – Jai Sharma Feb 15 '17 at 05:30
  • Right, but as you're filtering out the dates in between and keeping only 1 date per week, weekly and daily interval should have the same behaviour IMO. – Jai Sharma Feb 15 '17 at 05:36
  • Yes, that's correct, but `daily` is more consistent with the bucket size you'll get. – Val Feb 15 '17 at 05:38