4

I have an index of SendGrid event data:

"_source": {
    "externalId": "9283cc1d-b003-xxxx-a5af-84fcf31c4181",
    "email": "test@test.com",
    "timestamp": 1616515214,
    "event": "processed",
    "uid": null,
    "id": null,
    "sendgridEventId": null,
    "smtpId": null,
    "sgMessageId": null,
    "sgEventId": null,
    "sendgridEvent": null,
    "type": null,
    "category": [],
    "reason": null,
    "status": null,
    "url": null,
    "useragent": null,
    "ip": null,
    "response": null,
    "tls": null,
    "attempt": null,
    "sendAt": null,
    "asmGroupId": null
}

Now I like to aggregate all of these events for a given day using the timestamp attribute.

GET /sendgridevententity/_search
{
  "query":
   {
    "match_all": {}
   },
   "aggs": {
     "amount_per_day": {
       "date_histogram": {
         "field": "timestamp",
         "calendar_interval": "1d"
       }
     }
   }
}  

Unfortunately, this just yields all the single events as they all have a different timestamp and the aggregation does not group them by day.

How can I convert the timestamps to date and then run the aggregation?

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
Alex
  • 1,857
  • 3
  • 36
  • 51

1 Answers1

4

You can take advantage of a multi field mapping. Here's how it works.

  1. Update the existing mapping with a new date "sub-field". I'm assuming timestamp was originally mapped as a long. I'm also assuming the timestamps are in epoch seconds, thereby the explicitly set format:
POST sendgridevententity/_mapping
{
  "properties": {
    "timestamp": {
      "type": "long",
      "fields": {
        "as_date": {
          "type": "date",
          "format": "epoch_second"
        }
      }
    }
  }
}
  1. This new property now needs to be picked up and your data needs to be reindexed. You can trigger a reindex call via a cool little trick — sending an empty _update_by_query request:
POST sendgridevententity/_update_by_query
  1. After the reindex operation finishes, you can target the new date field through the dot notation:
GET /sendgridevententity/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "amount_per_day": {
      "date_histogram": {
        "field": "timestamp.as_date",
        "format": "yyyy-MM-dd", 
        "calendar_interval": "1d"
      }
    }
  }
}

⚠️ Depending your index size and many other factors, the _update_by_query request may appear to time out. It's possible to set wait_for_completion=false which'll trigger an asynchronous background task instead.

Note that I used size: 0 in the final request. It's a handy tool for returning only the aggregation results.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • One additional question: I like to use it in a Kibana Visualization. But for the Bucket I can not use the format and it just states that there is no field of type "date". Is there a way to use this field there as well? – Alex Mar 26 '21 at 15:09
  • 1
    Yes, it should be. It may be necessary to recreate the [Index Pattern](https://www.elastic.co/guide/en/kibana/current/index-patterns.html) though so that it picks up the date field too. BTW it's not `date` -- it's `timestamp.as_date` and that's what should be able to choose from the dropdown when the index pattern is up to date. – Joe - GMapsBook.com Mar 26 '21 at 15:16