8

I'm pretty new to couchDB and even after reading (latest archive as now deleted) http://wiki.apache.org/couchdb/How_to_store_hierarchical_data (via ‘Store the full path to each node as an attribute in that node's document’) it's still not clicking just yet.

Instead of using the full path pattern as described in the wiki I'm hoping to keep track of children as an array of UUIDs and the parent as a single UUID. I'm leaning towards this pattern so I can maintain the order of children by their positions in the children array.

Here are some sample documents in couch, buckets can contain buckets and items, items can only contain other items. (UUIDs abbreviated for clarity):

{_id: 3944
 name: "top level bucket with two items"
 type: "bucket",
 parent: null
 children: [8989, 4839]
}
{_id: 8989
 name: "second level item with no sub items"
 type: "item"
 parent: 3944
}
{
 _id: 4839
 name: "second level bucket with one item"
 type: "bucket",
 parent: 3944
 children: [5694]
}
{
 _id: 5694
 name: "third level item (has one sub item)"
 type: "item",
 parent: 4839,
 children: [5390]
}
{
 _id: 5390
 name: "fourth level item"
 type: "item"
 parent: 5694
}

Is it possible to look up a document by an embedded document id within a map function?

function(doc) {
    if(doc.type == "bucket" || doc.type == "item")
        emit(doc, null); // still working on my key value output structure
        if(doc.children) {
            for(var i in doc.children) {
                // can i look up a document here using ids from the children array?
                doc.children[i]; // psuedo code
                emit(); // the retrieved document would be emitted here
            }
        }
     }
}   

In an ideal world final JSON output would look something like.

{"_id":3944,
 "name":"top level bucket with two items",
 "type":"bucket",
 "parent":"",
 "children":[
     {"_id":8989, "name":"second level item with no sub items", "type":"item", "parent":3944},
     {"_id": 4839, "name":"second level bucket with one item", "type":"bucket", "parent":3944, "children":[
         {"_id":5694", "name":"third level item (has one sub item)", "type":"item", "parent": 4839, "children":[
             {"_id":5390, "name":"fourth level item", "type":"item", "parent":5694}
         ]}
     ]}
 ]
}
berg
  • 614
  • 3
  • 9
  • 23

2 Answers2

8

You can find a general discussion on the CouchDB wiki.

I have no time to test it right now, however your map function should look something like:

function(doc) {
    if (doc.type === "bucket" || doc.type === "item")
        emit([ doc._id, -1 ], 1);
        if (doc.children) {
            for (var i = 0, child_id; child_id = doc.children[i]; ++i) {
                emit([ doc._id, i ], { _id: child_id });
            }
        }
    }
}

You should query it with include_docs=true to get the documents, as explained in the CouchDB documentation: if your map function emits an object value which has {'_id': XXX} and you query view with include_docs=true parameter, then CouchDB will fetch the document with id XXX rather than the document which was processed to emit the key/value pair.

Add startkey=["3944"]&endkey["3944",{}] to get only the document with id "3944" with its children.

EDIT: have a look at this question for more details.

Community
  • 1
  • 1
Marcello Nuccio
  • 3,901
  • 2
  • 28
  • 28
  • Thanks for helping out Marcello. When I run the map function the output isn't nested as I was hoping, it's all flat. Any ideas? – berg May 26 '11 at 08:01
  • My answer is [here](http://stackoverflow.com/questions/6084741/how-to-merge-view-collation-into-useful-output-in-couchdb/6094540#6094540). However I do not reccomend it. What is the advantage of a nested list? The flat list is ordered so that every "item" or "bucket" is immediately followed by its children in the requested order. It is very easy and efficient to traverse this list. Why do you require a nested list? May be I can give you a better solution. – Marcello Nuccio May 26 '11 at 15:12
  • I was hoping to use the results directly in my client side JavaScript code which is expecting the data to come back nested. But after reading the question you linked to it appears this goes against the grain of CouchDB so I will plan on doing this client side! Thanks again I will mark this as the answer! – berg May 26 '11 at 17:39
  • There's nothing inherently wrong in nesting the children within the parent. The best approach is strongly dependent on the use case, because CouchDB gives you a lot of freedom compared to an SQL DB. – Marcello Nuccio May 27 '11 at 09:53
  • I tried this view on the data (with `include_docs` true), but the inner emit always produces an empty object as the value and the ordering appears to be arbitrary. – James Hopkin Feb 13 '14 at 17:03
  • @JamesHopkin, on what data are you trying the view? [View collation](https://wiki.apache.org/couchdb/View_collation) is well defined in CouchDB. – Marcello Nuccio Feb 14 '14 at 11:14
  • I tried a clean database with the 5 documents shown in the question. I misunderstood a little, so please ignore what I said about the order. What I am seeing is that the child entries, i.e. those emitted inside the loop, have an empty object as the value, and the 'doc' entry for the row is actually the parent. – James Hopkin Feb 17 '14 at 10:15
  • I guess I'm really asking what emitting a value of { _id: "" } does. It seems to have no effect in the results I'm seeing. – James Hopkin Feb 17 '14 at 11:24
  • @JamesHopkin, I've edited the answer with the relevant snippet of [the official documentation](http://docs.couchdb.org/en/latest/couchapp/views/joins.html). Hope it helps. – Marcello Nuccio Feb 18 '14 at 07:56
  • Ah, the problem was `child._id` in the view should say `child.toString()` instead. Works for me with that change. – James Hopkin Feb 18 '14 at 11:43
8

Can you output a tree structure from a view? No. CouchDB view queries return a list of values, there is no way to have them output anything other than a list. So, you have to deal with your map returning the list of all descendants of a given bucket.

You can, however, plug a _list post-processing function after the view itself, to turn that list back into a nested structure. This is possible if your values know the _id of their parent — the algorithm is fairly straightforward, just ask another question if it gives you trouble.

Can you grab a document by its id in the map function? No. There's no way to grab a document by its identifier from within CouchDB. The request must come from the application, either in the form of a standard GET on the document identifier, or by adding include_docs=true to a view request.

The technical reason for this is pretty simple: CouchDB only runs the map function when the document changes. If document A was allowed to fetch document B, then the emitted data would become invalid when B changes.

Can you output all descendants without storing the list of parents of every node? No. CouchDB map functions emit a set of key-value-id pairs for every document in the database, so the correspondence between the key and the id must be determined based on a single document.

If you have a four-level tree structure A -> B -> C -> D but only let a node know about its parent and children, then none of the nodes above know that D is a descendant of A, so you will not be able to emit the id of D with a key based on A and thus it will not be visible in the output.

So, you have three choices:

  • Grab only three levels (this is possible because B knows that C is a descendant of A), and grab additional levels by running the query again.
  • Somehow store the list of descendants of every node within the node (this is costly).
  • Store the list of parents of every node within the node.
Victor Nicollet
  • 24,361
  • 4
  • 58
  • 89