MongoDB Schema Design (nested array vs separate collection)

Question

I'm writing client management web application. I'm trying to figure out the right way to manage my clients-payments relations. Once a day application is sending request to another API and synchronizing amount of payments for each client which I store in my database. I constantly need to run reports on payments (amount of payments) based on type of client (contract_type, sale_date and so on). I already have a clients collection. I trying to choice between two schema:

{
      "client_id": "asdf123",
      "client_last_name": "BB",
      "address": "123 Main St",
      "city": "ATLANTA",
      "payments_history": [
        {
          "contract_number": "asdf123",
          "payment_date": ISODate("2012-09-02T07:00:00.0Z"),
          "amount": 103.33,
          "payment_number": NumberInt(1)
        },
        {
          "contract_number": "asdf123",
          "payment_date": ISODate("2012-09-30T07:00:00.0Z"),
          "amount": 103.33,
          "payment_number": NumberInt(2)
        },
        {
          "contract_number": "asdf123",
          "payment_date": ISODate("2012-11-04T07:00:00.0Z"),
          "amount": 103.33,
          "payment_number": NumberInt(3)
        }
      ]
  }

Versus creating separate collection "payments", where each document is a payment . I feel that it is better to separate those kind of data, since it will grow every single client document to enormous amount of data with each query (which will still take a lot of memory if I'm choosing particular fields). But on the other hand I won't be able to run the aggregation reports ( since it based on data from two different collections). What is the best approach? Should I separate them and do aggregation with two different queries on server side (php)?

Honestly, I can't see the compelling case for using MongoDB or NoSQL here, as this looks like a pretty straightforward relational use case. — Mike Brant, Apr 05 '13 at 22:02
There is complexity like unknown fields that could be changing during the time. What would be the right way using NoSQL? — Sergey Tsibel, Apr 05 '13 at 22:07
It sounds like you have already bumped up against some of the limitations of no SQL. But if like you said the individual records don't have fixed schema, it still may be you best solution. I will write up answer below. — Mike Brant, Apr 05 '13 at 22:29

score 2 · Accepted Answer · answered Apr 05 '13 at 22:35

Since it sounds like you will actually need to query against the payment data outside the context of the client (i.e. for aggregated reporting), I would not want to add each individual payment item to the client collection objects.

I would certainly create a payment object collection, and then either reference a payment key in the client object for each payment and the client key in the payment object, so you have a definitive way to relate one to the other in either direction, or have a third collection mapping clients to payments.

What is preferable here may really depend on your access pattern. For example, you may not even need such "foreign keys" on both set of objects if the lookup is always going to be in one direction for cases where you need to establish the relation.

what is the purpose of linking two collection documents with "foreign keys"(DBRef) if I can't access it during the aggregation? — Sergey Tsibel, Apr 05 '13 at 22:55

MongoDB Schema Design (nested array vs separate collection)

1 Answers1

Linked