0

Getting different results while searching on the date range aggregation indexing.

Created the index like below.

curl -XPUT -H 'Content-Type: application/json' 'http://x.x.x.x:9200/date_index' -d '{
  "settings" : { "keyspace" : "keyspace1"},
  "mappings" : {
    "table1" : {
      "discover":"sent_date",
      "properties" : {
        "sent_date" : { "type": "date", "format": "yyyy-MM-dd HH:mm:ssZZ" }
        }
    }
  }
}'

When trying searching with below code, i am getting different date range results.

    curl -XGET -H 'Content-Type: application/json' 'http://x.x.x.x:9200/date_index/_search?pretty=true' -d '
    {
      "aggs" : {
        "sentdate_range_search" : {
          "date_range" : {
            "field" : "sent_date",
            "time_zone": "UTC",
            "format" : "yyyy-MM-dd HH:mm:ssZZ",
            "ranges" : [
              { "from" : "2010-05-07 11:22:34+0000", "to" : "2011-05-07 11:22:34+0000"}
            ]
      }
    }
  }
}'

Sample output, showing different results like 2039, 2024 etc.

{
  "took" : 26,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 417427,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "date_index",
        "_type" : "table1",
        "_id" : "P89200822_4210021505784",
        "_score" : 1.0,
        "_source" : {
          "sent_date" : "2039-05-22T14:45:39.000Z"
        }
      },
      {
        "_index" : "date_index",
        "_type" : "table1",
        "_id" : "P89200605_4210020537428",
        "_score" : 1.0,
        "_source" : {
           "sent_date" : "2024-06-05T07:20:57.000Z"
        }
      },
      .........
    "aggregations" : {
    "sentdate_range_search" : {
      "buckets" : [
        {
          "key" : "2010-05-07 11:22:34+00:00-2011-05-07 11:22:34+00:00",
          "from" : 1.273231354E12,
          "from_as_string" : "2010-05-07 11:22:34+00:00",
          "to" : 1.304767354E12,
          "to_as_string" : "2011-05-07 11:22:34+00:00",
          "doc_count" : 0
         }
      ]
    }
  }

FYI: I am using the data that was resided in Cassandra Database where the field "sent_date" is stored with UTC timezone.

Please advise, thanks

Pramod
  • 113
  • 2
  • 14

1 Answers1

0

== Reworked answer based on conversation in the comments ==

Aggregations are different than search queries. Aggregations combine records (i.e. aggregate!) along specified dimensions. The query in the question aggregates records that fall between the two specified dates into a single bucket. More info on aggregations can be found in the Elasticsearch documentation

Since the requirement is to filter records that fall between two dates, a date range filter is the appropriate approach:

GET date_index/_search
{
   "query": {
       "bool": {
           "filter": {
               "range": {
                   "sent_date": {
                       "gte": "2010-05-07 11:22:34+0000",
                       "lte": "2011-05-07 11:22:34+0000"
                   }
               }
            }
        }
    }
}

Why filter instead of regular query? Filters are faster than searches because they don't contribute to document scoring and they're cacheable. You can combine filters and searches to, for example, get all records within the given time range that match the phrase "all work and no play makes jack a dull boy."

mike b
  • 484
  • 2
  • 7
  • I am not bothering about the number of output results, here i am getting wrong output. In the question, i am requesting results for date range "from" : "2010-05-07 11:22:34+0000", "to" : "2011-05-07 11:22:34+0000". But I am getting results of 2039.., 2024.. etc. – Pramod Nov 02 '18 at 16:16
  • but you aren't: the search results are outside the range, but the agg results look correct: "sentdate_range_search" : { "buckets" : [ { "key" : "2010-05-07 11:22:34+00:00-2011-05-07 11:22:34+00:00", "from" : 1.273231354E12, "from_as_string" : "2010-05-07 11:22:34+00:00", "to" : 1.304767354E12, "to_as_string" : "2011-05-07 11:22:34+00:00", "doc_count" : 0 } ] } – mike b Nov 02 '18 at 16:22
  • Sorry I am newbie, if my understanding was wrong. As you said, why are the search results outside the range. Although the aggregation result looking correct. Any thing i am missing. – Pramod Nov 02 '18 at 16:31
  • query and aggregations are different parts of the `_search` API. Since you only specified an aggregation, you got the right agg results, but essentially did a "search all" search. So, if you want to only get back the agg results, set the `size=0` like in my answer. If you only want to search, remove the aggregation and do something like `query { range { sent_date { gte: "2010-05-07 11:22:34+0000", lte: "2011-05-07 11:22:34+0000" } } }`. That make sense? [link to Elasticsearch range query docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html) – mike b Nov 02 '18 at 16:36
  • Thanks, this worked `query { range { sent_date { gte: "2010-05-07 11:22:34+0000", lte: "2011-05-07 11:22:34+0000" } } }` – Pramod Nov 02 '18 at 16:46