0

The count of PARTY_ID in both the aggregation should be same. In one case it is 3000 and the other case it is sum of all values (2675 + 244 + 41 + 6 + 2 = 2950 ) which are not equal. What might be the reason ?

GET /test/data/_search
{
   "size": 0,
   "aggs": {
      "ASSET_CLASS": {
         "terms": {
            "field": "ASSET_CLASS_WORST"
         },
         "aggs": {
            "ASSET_CLASS": {
               "cardinality": {
                  "field": "PARTY_ID"
               }
            }
         }
      },
      "Total count": {
         "cardinality": {
            "field": "PARTY_ID"
         }
      }
   }
}

Result :

{
   "took": 9,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 51891,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "Total count": {
         "value": 3000
      },
      "ASSET_CLASS": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "NPA",
               "doc_count": 49252,
               "ASSET_CLASS": {
                  "value": 2675
               }
            },
            {
               "key": "RESTRUCTURED",
               "doc_count": 2275,
               "ASSET_CLASS": {
                  "value": 244
               }
            },
            {
               "key": "SMA2",
               "doc_count": 308,
               "ASSET_CLASS": {
                  "value": 41
               }
            },
            {
               "key": "SMA1",
               "doc_count": 42,
               "ASSET_CLASS": {
                  "value": 6
               }
            },
            {
               "key": "SMA0",
               "doc_count": 14,
               "ASSET_CLASS": {
                  "value": 2
               }
            }
         ]
      }
   }
}
Bond
  • 165
  • 2
  • 15

1 Answers1

1

The first line of the documentation for cardinality aggregation reads:

A single-value metrics aggregation that calculates an approximate count of distinct values.

(emphasis mine)

An error of 10 out of 3000 is well below 1%, so it's just to be expected.

The cardinality aggregation uses an enhanced version of HyperLogLog calculus which has interesting features like constant memory complexity and O(N) time complexity.

If you need more precise results, try an higher setting for the precision_threshold parameter.

GET /test/data/_search
{
   "size": 0,
   "aggs": {
      "ASSET_CLASS": {
         "terms": {
            "field": "ASSET_CLASS_WORST"
         },
         "aggs": {
            "ASSET_CLASS": {
               "cardinality": {
                  "field": "PARTY_ID",
                  "precision_threshold": 10000
               }
            }
         }
      },
      "Total count": {
         "cardinality": {
            "field": "PARTY_ID",
            "precision_threshold": 10000
         }
      }
   }
}
Shadocko
  • 1,186
  • 9
  • 27