0

I'm currently working with a dataset that includes Facebook posts and I'm attempting to implement an aggregation using Elasticsearch. Specifically, I want to bucket the data by user using the 'terms' aggregation with the 'author_id' field.

Here's my code for that: search_object.aggs.bucket( "users", "terms", field="author_id", size = n_sample_users )

My goal is to calculate 'num_of_unique_days_active', which is the number of days the user made at least one post.

My initial thought was to use the "cardinality" metric aggregation on the 'created_at' field to find the unique number of days the user has posted. However, the 'created_at' field in my data is formatted as follows: yyyy-MM-dd'T'HH:mm:ss.SSS'Z', which includes both the date and time of each post.

I'm unsure how to calculate the cardinality based only on the date (ignoring the time) of each post. I've tried using a date_histogram aggregation before applying the cardinality aggregation, but I haven't been able to get it to work.

Does anyone have advice on how to calculate the number of unique days a user has posted, based on a timestamp field that includes both date and time? Any working code examples would be greatly appreciated.

0 Answers0