3

I have taken the user count from the elastic search index. same query but different histogram interval types like Day, Month, Week, Quarter, and year the counts are not matching correctly

note: totally I have only 4 months of data for this year

This is for month interval ES search query

{
     "query": {
            "range": {
              "@timestamp": {
                "gte": "2022-01-01",
                "lte": "2022-04-14"
              }
            }}, 
  "aggs": {
    "dt": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "month",
        "format": "yyyy-MM-dd"
         
      },          
  "aggs": {
    "events": {
      "nested": {
        "path": "events"
      },
 
  "aggs": {
    "unique_user_count": {
      "cardinality": {
        "field": "events.actor.id.keyword"
      }
    }
  }
}}}}}
  

got below Month results (response)

{
  "aggregations": {
    "dt": {
      "buckets": [
        {
          "key_as_string": "2022-01-01",
          "key": 1640995200000,
          "doc_count": 2930,
          "events": {
            "doc_count": 13988,
            "unique_user_count": {
              "value": 37
            }
          }
        },
        {
          "key_as_string": "2022-02-01",
          "key": 1643673600000,
          "doc_count": 36910,
          "events": {
            "doc_count": 175151,
            "unique_user_count": {
              "value": 580
            }
          }
        },
        {
          "key_as_string": "2022-03-01",
          "key": 1646092800000,
          "doc_count": 24861,
          "events": {
            "doc_count": 133383,
            "unique_user_count": {
              "value": 555
            }
          }
        },
        {
          "key_as_string": "2022-04-01",
          "key": 1648771200000,
          "doc_count": 6005,
          "events": {
            "doc_count": 30730,
            "unique_user_count": {
              "value": 170
            }
          }
        }
      ]
    }
  }
}

Again I run the same query but changed intervals = Year

{
         "query": {
                "range": {
                  "@timestamp": {
                    "gte": "2022-01-01",
                    "lte": "2022-04-14"
                  }
                }}, 
      "aggs": {
        "dt": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "year",
            "format": "yyyy-MM-dd"
             
          },          
      "aggs": {
        "events": {
          "nested": {
            "path": "events"
          },
     
      "aggs": {
        "unique_user_count": {
          "cardinality": {
            "field": "events.actor.id.keyword"
          }
        }
      }
    }}}}}

   

I got the below Year response

{
  "aggregations": {
    "dt": {
      "buckets": [
        {
          "key_as_string": "2022-01-01",
          "key": 1640995200000,
          "doc_count": 70706,
          "events": {
            "doc_count": 353252,
            "unique_user_count": {
              "value": 1007
            }
          }
        }
      ]
    }
  }
}

My Expected results like this year = 37+580+555+170 year = 1342 ----> but I got "1007" wrong values

how do match the sum(month) value and year values?

Samynathan
  • 51
  • 3

1 Answers1

2

In your monthly buckets you're running a cardinality aggregation to get the unique user count per month.

If you're running the same aggregation over a year, the unique user count cannot be the sum of the monthly user count because a given user might have had interactions during several months.

If you compare the total events count they do match: 13988 + 175151 + 133383 + 30730 = 353252

So all is fine, you just need to compare apples to apples

Val
  • 207,596
  • 13
  • 358
  • 360