Spark Java Aggregation

Question

I'm trying Spark with Java and MongoDB and I want to aggregate some Documents into a single one based on timestamps. For example, I want to aggregate X documents into a single one:

{
    "_id" : ObjectId("598c32f455f0353f9e69ebf1"),
    "_class" : "...",
    "timestamp" : ISODate("2017-08-10T10:17:00.000Z"),
    "value" : 10.1
}
...
{
    "_id" : ObjectId("598c32f455f0353f9e69ebz2"),
    "_class" : "...",
    "timestamp" : ISODate("2017-08-10T10:18:00.000Z"),
    "value" : 2.1
}

Lets say I have 60 documents like this and their timestamps are in a window of 1 minute (from 10:17:00 to 10:18:00) and I want to obtain one document:

{
    "_id" : ObjectId("598c32f455f0353f9e69e231"),
    "_class" : "...",
    "start_timestamp" : ISODate("2017-08-10T10:17:00.000Z"),
    "end_timestamp" : ISODate("2017-08-10T10:18:00.000Z"),
    "average_value" : **average value of those documents**
}

Is it possible to perform this kind transformation? Can I retrieve one window of 1 minute of data at a time?

An approach which takes all the documents and compare their timestamps looks slow and inefficient.

Thanks in advance.

Can you be more specific? So you want to aggregate all documents that are inside one minute? What is a minute like from 0-60 s or from may 1:30 to 2:30 — jojo_Berlin, Sep 06 '17 at 07:36
Yes, all documents that are inside one minute. A minute is from 1:30 to 2:30 — Razvan, Sep 06 '17 at 07:44
the solution is here https://stackoverflow.com/questions/41711716/how-to-aggregate-over-rolling-time-window-with-groups-in-spark ; basically what you do: define a time window; load the data into a dataframe order it by timestamp and apply it — jojo_Berlin, Sep 06 '17 at 08:19
You can also doing all this work using the mongo aggregation framework, no need spark to compute an average value of 60 documents.... — Bameza, Sep 06 '17 at 08:30

Spark Java Aggregation

0 Answers0