What is an efficient design for MongoDB documents with arrays that grow very frequently?

Question

I have a MongoDB document design that stores array data in 6 of its top level property fields. The document basically stores IoT data that is collected from a specific set of sensors for the day and it is updated very frequently throughout the day (once every 2 seconds). Each new sensor packet appends data to the ends of all 6 arrays which means that by the end of the day each array can have a maximum of 43200 values (even though it never gets that much).

The basic structure is as follow:

{
  _id: string,
  tracker: string,
  startTime: Date,
  endTime: Date,
  sensor1: number[],
  sensor2: number[],
  path: { 
    type: "Linestring",
    coordinates: number[][],
  },
  times: Date[],
  ...
}

Recently it seems as though our database has been "struggling with high IOPS" which we think might be caused by constantly appending to these arrays. According to the MongoDB consultant this was the case for several primary restarts in the past few months, even though our tier allows 3000 IOPS and we are only maxing at 2000 in peak times. We are currently running a replica set on Atlas with an M30 tier.

MongoDB suggests that unbounded arrays should be avoided because of the way that documents are moved on the disk if they outgrow their allocated space in size. This seemed to have been a noticeable problem for the MMAP storage engine, but according to their docs this was solved with MongoDB 4.0 which uses the WiredTiger storage engine.

So I guess my question would be the following:

Can someone confirm whether or not the WiredTiger storage engine also moves documents around on the disk once they outgrow their allocated size? How often would this happen and can this have a major effect? The docs also state that storage is allocated in powers of 2. If this is the case then there should only be minimal "document moves" for a single document since this increases exponentially with document size?
Taking into account the fact that I still need access to unprocessed/uncomputed data, what would be a better way to store this data if any?

Thanks in advance!

Takis · Answer 1 · 2021-09-20T22:21:14.527

Updating one document => Load document in memory (you can do simple benchmark to test it)
When document gets big => each update costs more

Solution => keep smaller arrays by reducing the time range.

You have like 1 day time range, you can make it like 5 hours or 1 hour.
(to get all day measurements you can group after) I think in your case just having shorter time-range => smaller arrays , it will be enough One way to do it is to have one extra field {:id 1, :hour 1} {:id 1 ,:hour 2} ... , the new hour field should be indexed.

As far as i know it happens, documents are moved but MongoDB has a way to do it fast by pre-allocate space If you need more internal information you can also ask here but i don't think that this is your problem or that you will find a way to update and fast, big documents.(you update so often so size causes problems)

*Maybe the are better ways do to it, than my solution, its best to wait for other answers also.

thanks for the answer. I have been thinking that this would be the best solution thus far. — Loupi, Sep 21 '21 at 07:45
in MongoDB 5 Time Series Collections are added, i never used them yet, but check it, if you want, maybe its related. — Takis, Sep 21 '21 at 07:51

What is an efficient design for MongoDB documents with arrays that grow very frequently?

1 Answers1