1

I have mongo collection with data records coming from gps device in every 2-3 second, data sample is as shown here

1. {imei:123456.., port:8000,createdon:timestamp, updatedon : timestamp} 
2. {imei:223456.., port:8000,createdon:timestamp, updatedon : timestamp} 
3. {imei:123456.., port:8000,createdon:timestamp, updatedon : timestamp} 
4. {imei:323456.., port:8000,createdon:timestamp, updatedon : timestamp}
5. {imei:223456.., port:8000,createdon:timestamp, updatedon : timestamp} 
6. {imei:223456.., port:8000,createdon:timestamp, updatedon : timestamp} 
7. {imei:323456.., port:8000,createdon:timestamp, updatedon : timestamp}

... around 2 billion records

my requirement is to create aggregate query to find all imeis for which time difference between 2 consecutive records is >= given time in minutes for a particular date time.

like for imei:223456.. above records will be as below:

- a. {imei:223456.., port:8000,createdon:timestamp, updatedon : timestamp} 
- b. {imei:223456.., port:8000,createdon:timestamp, updatedon : timestamp}
- c. {imei:223456.., port:8000,createdon:timestamp, updatedon : timestamp} 

if for a given date time these records are considered then time diff b.createdon - a.createdon => let say 2 min similarly c.createdon - b.createdon = 3 min

if given time is 3 minutes then imei:223456.. is considered as there is a matching time difference record, otherwise it is dropped from the output.

I have tried query as following but its not giving the correct result

collection
  .aggregate([{
      $match: {
        "createdon": // filter out all items that are inside a range 
        {
          $gte: start_dt,
          $lte: end_dt
        }
      }
    },
    {
      $group: {
        _id: "$imei",
        max_time: {$max: "$createdon"},
        min_time: {$min: "$createdon"}
      }
    },
    {
      $set: {
        difference: {
          $divide: [{$subtract: ["$max_time", "$min_time"]}, 60 * 1000]
        } // get difference in minutes
      }
    },
    {
      $match: { // filter out all items that are having difference >= 3 min or any given time
        difference: {$gte: minutes}
      }
    }
  ]);

request params start_dt= new Date('02-24-2023').getTime(), end_dt= new Date('02-25-2023').getTime(), and time diff minutes = 3, as a result I should get all IMEIs for those any 2 consecutive records difference is exceeding given time 3 min

rickhg12hs
  • 10,638
  • 6
  • 24
  • 42
  • 1
    What is the [BSON type](https://www.mongodb.com/docs/manual/reference/bson-types/) of `timestamp`? For example, is it a 32/64-bit integer (milliseconds since some epoch?), double (milliseconds ...), date, timestamp, etc.? – rickhg12hs Apr 30 '23 at 20:25
  • Please provide the sample data and your examples including the `createdon` values – nimrod serok May 01 '23 at 06:53
  • It is not clear what is the expected output in a case of a gap. For example, for allowed 3 minutes gap, what if we have: `1, 2, 3, 4, 18, 19, 20`? Should we keep only `18, 19, 20`? – nimrod serok May 01 '23 at 07:51
  • Sample json is { "imei": 1234567890123456, "avldt": 1681839734000, "createdon": 1682499349758, its time in millisecond "updatedon": 1682499564900 } – mohammad Idris May 01 '23 at 12:19
  • in case of 1, 2, 3, 4, 18, 19, 20 we need to have all values >=3 means , 3, 4, 18 ,19 ,20 etc, out put should have list of records of this type {imei:123....., max_time: source, min_time :timestamp, timediff:} max_time and min_time = createdon timestamp of records whose time difference we have taken in this case , 3, 4, 18 etc. – mohammad Idris May 01 '23 at 12:29
  • I don't mean that the difference is `1, 2, 3, 4, 18, 19, 20`, I mean in case that this is the minutes value of the `createdon`. For example: `ISODate("2023-05-01T13:01:00.000Z"), ISODate("2023-05-01T13:02:00.000Z"), ISODate("2023-05-01T13:04:00.000Z"), ISODate("2023-05-01T13:18:00.000Z")...` – nimrod serok May 01 '23 at 12:41
  • here is the output of above query just showing one sample here { "_id": 12345678901234566 =imei "max_time": 1682499619786, "min_time": 1682456443752, "difference": 719.606 in min } since output data is getting bigger, so I am thinking to keep only one record having max diff in output array, for example in case of 1, 2, 3, 4, 18, 19, 20 we can have a record of difference 20 or more, for one imei, this way we will have single record of all those imeis where time difference is bigger than given threshold. idea is to fetch all imeis for those time difference is more than given value – mohammad Idris May 01 '23 at 12:47
  • yes you are correct , since data is coming from gps devices it could be possible that for some devices/ imeis time difference is consistent , may be 1 min or 2 min and for some devices it is inconsistent , so the idea is to fetch all those imeis where time difference is not consistent – mohammad Idris May 01 '23 at 12:53

1 Answers1

0

One option to find imeis where the gap between consecutive documents is greater than minutes is to use $setWindowFields:

db.collection.aggregate([
  {$match: {
      createdon: {
        $gte: start_dt,
        $lte: end_dt
      }
  }},
  {$setWindowFields: {
      partitionBy: "$imei",
      sortBy: {createdon: 1},
      output: {
        prev: {
          $first: "$createdon",
          window: {documents: [-1, 0]}
        }
      }
  }},
  {$match: {
      $expr: {$gte: [
          {$divide: [{$subtract: ["$createdon", "$prev"]}, 60000]},
          minutes
      ]}
  }},
  {$group: {_id: "$imei"}}
])

See how it works on the playground example

nimrod serok
  • 14,151
  • 2
  • 11
  • 33