I'm trying Spark with Java and MongoDB and I want to aggregate some Documents into a single one based on timestamps. For example, I want to aggregate X documents into a single one:
{
"_id" : ObjectId("598c32f455f0353f9e69ebf1"),
"_class" : "...",
"timestamp" : ISODate("2017-08-10T10:17:00.000Z"),
"value" : 10.1
}
...
{
"_id" : ObjectId("598c32f455f0353f9e69ebz2"),
"_class" : "...",
"timestamp" : ISODate("2017-08-10T10:18:00.000Z"),
"value" : 2.1
}
Lets say I have 60 documents like this and their timestamps are in a window of 1 minute (from 10:17:00 to 10:18:00) and I want to obtain one document:
{
"_id" : ObjectId("598c32f455f0353f9e69e231"),
"_class" : "...",
"start_timestamp" : ISODate("2017-08-10T10:17:00.000Z"),
"end_timestamp" : ISODate("2017-08-10T10:18:00.000Z"),
"average_value" : **average value of those documents**
}
Is it possible to perform this kind transformation? Can I retrieve one window of 1 minute of data at a time?
An approach which takes all the documents and compare their timestamps looks slow and inefficient.
Thanks in advance.