1

For multiple reasons, I'm currently considering moving to a NoSQL DB to store/manage a set of "scorecards". As scorecard is a simple table, with rows for regions and columns for metrics/indicators. As simple scorecard document for one month could look like this:

{
  "month": 1,
  "headers": ["Region", "# of page views", "# of unique visitors"],
  "data": [
     ["Region 1", {"value": 1000, "previousValue": 800, "arrow": "up" }, {"value": 100, "previousValue": 110, "arrow": "down"}],
     ["Region 2", {"value": 500, "previousValue": 600, "arrow": "down" }, {"value": 10, "previousValue": 11, "arrow": "down"}]
  ]
}

resulting in this rendered table:

|  Region  | # of page views | # of unique visitors |
|----------|-----------------|----------------------|
| Region 1 | 1000 (↑)        | 100 (↓)              |
| Region 2 | 500 (↓)         | 10 (↓)               |

Every month, a new scorecard will be uploaded, i.e. a new document gets created. This scorecard might have a different structure, e.g.

{
  "month": 2,
  "headers": ["Region", "# of page views", "# of comments"],
  "data": [
     ["Region 1", {"value": 1100, "previousValue": 1000, "arrow": "up"}, {"value": 5, "previousValue": null, "arrow": null}],
     ["Region 3", {"value": 1500, "previousValue": null, "arrow": null},{"value": 1, "previousValue": null, "arrow": null}]
  ]
}

resulting in

|  Region  | # of page views | # of comments |
|----------|-----------------|---------------|
| Region 1 | 1100 (↑)        | 5 (-)         |
| Region 3 | 1500 (-)        | 1 (-)         |

When I now receive an update for the month 1 scorecard, I also need to update the previousValue (and the arrow) property in the month 2 scorecard. Given that the structure might have changed, how can I do this update most efficiently? How can I "linked" the cells?

In a RDBMS, I would have normalized the values in another table and just linked the valueIds. However, as any scorecard can have a lot of values, I assume that storing the valueIds in the scorecard documents and retrieving all values in a second step is really inefficient?!

I have not decided on the NoSQL DB provider. Current favorite is CouchDb, but can also be MongoDB or DocumentDB. The backend is implemented in ASP.NET WebAPI.

Peter Albert
  • 16,917
  • 5
  • 64
  • 88
  • 1
    I cant see the point for having the `previousValue`. Could be done by the database by selecting the previous month and calculating inapplication. On linked "cells", this could be helpfull: http://docs.ehealthafrica.org/couchdb-best-practices/#linked-documents General idea could be using _id's that are "known" before query, like `_id: 'stats-montly-2015-03'`. – h4cc Jun 28 '15 at 17:05
  • @JuliusBeckmann: the linked cells might work, thanks! but I guess there's no way to do a "join" in the DB so that the value is inserted in the document, right? i.e. need to do that in the client app... – Peter Albert Jun 29 '15 at 11:28

1 Answers1

4

Most NoSQL databases do not support JOINs between multiple records - so you are left with a few different ways to model relationships.

Normalizing

As you mentioned, you could leave a soft-link to another document... and then resolve the reference with a follow-up query.

Generally, you'd normalize data that you'd want to optimize for writes.

Note: In DocumentDB, you can reduce the cost associated with multiple network requests in the context of follow-up queries by using a stored procedure. This allows you to perform a sequence of operations as a single network request.

De-normalizing

Alternatively, you could embed related data as JSON objects. This eliminates the need to resolve references with follow-up queries; but may introduce complexities when mutating data (e.g. fanning out a write to multiple records).

Generally, you'd normalize data you'd want to optimize for reads.

Why not both? Hybrid approaches

You could also take a hybrid approach. De-normalize a subset of fields that a fairly static or frequently read; and then normalize fields that are frequently written or infrequently read.

References

For more in-depth info, I'd recommend checking out:

Community
  • 1
  • 1
Andrew Liu
  • 8,045
  • 38
  • 47