0

Imagine you have millions of users those who perform transactions on your platform. Assuming each transaction is a document in your MongoDB collection there would be millions of documents generated everyday thus exploding your database in no time. I have received the following solutions from friends and family.

  1. Having TTL index on the document - This won't work because we need those document stored somewhere so that it can be retrieved at a later point in time when the user demands for it.
  2. Sharding the collection with timestamp as the key - This won't help us control the time frame we want the data to be backed up.

I would like to understand and implement a strategy somewhat similar to what banks follow. They keep your transactions upto a certain point (eg: 6 months) after which you have to request them via support or any other channel. I am assuming they follow a Hot/Cold storage pattern but I am not completely sure about it.

The entire point is to manage transaction documents and on a daily basis back up or move the older records to another place where it can be read from. Any idea how that is possible with MongoDB?


Update: Sample Document (Please note there are few other keys from the document that have been redacted)

{
    "_id" : ObjectId("5d2c92d547d273c1329b49f0"),
    "transactionType" : "type_3",
    "transactionTimestamp" : ISODate("2019-07-15T14:51:54.444Z"),
    "transactionValue" : 0.2,
    "userId" : ObjectId("5d2c92f947d273c1329b49f1")
}
Abhay Pai
  • 314
  • 4
  • 17

1 Answers1

0

First Create a Table Where you want to save all records. (As you mentioned the sample, let's take this entry stored on a collection named A).

After that Create a backup at daily midnight and then after successful backup restored the collection with named timestamp.

After successful entry stored on table, you can truncate the original table.

By this approach you have a limited entry table on the collection and also have all records.

Aks
  • 1,092
  • 3
  • 12
  • 29
  • The strategy sounds good. But at scale the issue that occurs is that the backup itself takes time after which by the time we truncate the collection the backup script would be triggered creating multiple instances of the backup. – Abhay Pai Jul 16 '19 at 14:57
  • you can write the shell script for mongodb native query for dump. Which I guess completed under few seconds. For checking on table that table have values or not you can call the `cursor.count()`. If it will greater than zero then you perform your task – Aks Jul 17 '19 at 07:28