11

Just wanted an opinion, or at least a rule of thumb over which is better in a database structure for CouchDB. Is it better to have all related data for an item in a single document, or have parts of all items in many documents?

Let me illustrate what I mean by giving you an example. I currently log 4 events from our system, at 1 minute intervals, lets call them event_1, event_2, event_3 and even_4. Data is stored for each of the 4 events, regardless of value (you'll always get a value, even if everything is okay).

Option 1: Group events, and append new timestamp/values to the document...

{
    event_1: [ 
        { timestamp, value },
        { timestamp, value },
        { timestamp, value },
        ...etc
    ]
},
{
    event_2: [ 
        { timestamp, value },
        { timestamp, value },
        { timestamp, value },
        ...etc
    ]
},
{
    event_3: [ 
        { timestamp, value },
        { timestamp, value },
        { timestamp, value },
        ...etc
    ]
}
...etc

Option 2: Keep a huge list of documents, with the latest values (which is how they're actually delivered from the system)?

{
    timestamp: {
        { event_1, value },
        { event_2, value },
        { event_3, value },
        { event_4, value }
    }
},
{
    timestamp: {
        { event_1, value },
        { event_2, value },
        { event_3, value },
        { event_4, value }
    }
},
{
    timestamp: {
        { event_1, value },
        { event_2, value },
        { event_3, value },
        { event_4, value }
    }
}
...etc

I'm currently using the 2nd option, but was just curious to see peoples opinions on what would be considered best practice...I'm starting to think that Option 1 might be better, as the way i am reporting, results are grouped by event (shown in line graph of each event).

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
crawf
  • 9,448
  • 10
  • 33
  • 43

1 Answers1

9

I would definitely prefer your Option 2.

Since CouchDB keeps all revisions of its documents there would be huge memory consumption using Option 1. So with each new value you store the new values and also a copy of the old ones. Using Option 2 you only store the new values without touching the old ones.

phlogratos
  • 13,234
  • 1
  • 32
  • 37
  • 2
    @phlogratos to clarify, CouchDB only holds on to old revisions of a document until a compaction is run. – Matt Passell Jun 30 '11 at 22:46
  • 7
    +1. CouchDB committer Chris Anderson says "CouchDB likes tall lists, not fat lists." Imagine your documents in a text file, one-per-line. Having few very large documents would be a fat list. Having many very small documents would be a tall list. – JasonSmith Jul 01 '11 at 03:05
  • 1
    @jhs: True, but I have found that the most important rule is: "keep together everything you use together". Small documents are very efficient, but this efficiency is lost if you must retrieve hundreds of documents to do something useful. As always, you MUST do some tests, because the answer depends on your use-case. – Marcello Nuccio Jul 01 '11 at 08:19
  • @jhs, @Matt, @Marcello, Interesting points guys, I think this warrants a few test cases. I'm not seeing any noticeable slowness from using option 2 (currently 300,000 documents) but might change when over a few million.. – crawf Jul 01 '11 at 14:28
  • I've took long time to recognize the option 2 is the best and without headache – Slim_user71169 Sep 22 '15 at 04:29