0

I am new to working with Data.

So I have a lot of data based on time.

Data row for every 15 mins. Should I compute the data and store data for every 1 hour, 1 day, 1 month on the database?

if I do would this schema be good.

{
   _id: "joe",
   name: "Joe Bookreader",
   time min: [
                {
                  time: "1",
                  steps: "10"
                },
                {
                  time: "2",
                  steps: "4"
                }
              ]
   time day: [
                {
                  time: "1",
                  steps: "30"
                },
                {
                  time: "2",
                  steps: "30"
                }
              ]

 }

If you have any advice on how I can improve my data modeling knowledge with document databases, I would be really grateful.

Danny Varod
  • 17,324
  • 5
  • 69
  • 111
Supra
  • 1
  • 1

1 Answers1

0

For a minute step away from programmatic approach to the problem and think about the task at hand.

How are you going to use that data after you stored it? When you use the data it is important for you to know exactly number of steps for a particular user or you want to see a big picture based on the time particular sample points in time.

If you care for per user perspective then your scheme above will work. On the other hand if you want to run global reports like how far along users were on average (or total) during certain time,then I would opt in for schema where your document is time (point in time or range in time), while user and steps are your properties.

Another important concept in database is not to statically store data that can be calculated on the fly. As with any rules there are some exceptions to this. Like Cached values that are short lived and will not have major effect on your application if they are incorrect. Another one is reports, you produced a report for the user based on current values and stored it. If user feels like getting fresh data, user will re-run the report. (I am sure there are few other)

But in most cases the risk that comes with serving stale/wrong data resulting in wrong decision based on that data will outweigh performance benefit of avoiding extra calculations.

The reason I am mentioning this, is because you are storing time min and time day. If time day can be calculated based on time min you should not store it in the database, but rather calculate it on the fly. You can write queries that will produce actual result of time day without using any extra computational power on your application node. All computations will be done on the data node, much more efficiently than a compute node and without network penalties.

I realize this post is a bit old, but I hope my answer will help someone.

Dmitri Sandler
  • 1,123
  • 6
  • 17