For a clustered CouchDB setup, should I just go ahead and use BigCouch?

Question

I've been looking into CouchDB's attachments functionality. Basically, CouchDB allows you to store binary file data inside database records. Similar to MongoDB's GridFS. The project I'm wanting to build revolves heavily around file uploads, which I planned on storing in CouchDB. So this lead me to researching about how CouchDB clusters data, so that as my database grows, due to file attachments, I can cluster it out across multiple servers. I was disappointed to find that CouchDB does not have the ability to do this, out of the box. The CouchDB guide says to use something called couchdb-lounge, but that project is more than 2 years untouched, on Github. I don't think I'd feel comfortable building on that.

I found BigCouch, which appears to be a modified CouchDB with the exact clustering functionality that I need included, except that it looks like it lags behind the current stable CouchDB release. I did read, in a press release from a year ago, that they're working on merging BigCouch into the official CouchDB, but I don't know what the timeline for that looks like.

As a third option, it looks like Couchbase Server 2 is also based on CouchDB but has the clustering built on, amongst other features. I'm debating that as a viable option, too. It doesn't support the file attachments, though.

The fact that BigCouch will land in CouchDB, eventually, gives me some reassurance to go ahead and use BigCouch for now.

Should I use BigCouch? Why wouldn't everybody use BigCouch, if it's just CouchDB + clustering? There must be some down-side, right?

score 2 · Accepted Answer · answered Jan 05 '13 at 05:11

My needs are a bit different than yours at my job, but I've done work with Couchbase, CouchDB and BigCouch. I found BigCouch very easy to setup in the cloud and it only took one day to successfully create a cluster. We're investing in BigCouch and are committing to it for a major mobile initiative after doing our due diligence.

Reasons why:

BigCouch is fairly easy to setup in a cloud environment. The documentation is light, but I was able to get a simple cluster up and running quickly. I would recommend keeping an eye on the private hostnames of the machines in a cloud environment. (I can send along my detailed notes for creating machines in the cloud if that helps.)
BigCouch is maintained by Cloudant and of course it's open source, which is nice. The CTO of Cloudant told me they have already merged quite a bit of code into the Apache CouchDB project. Also Cloudant seems pretty stable, so we're counting on them to keep the project up to date. It seems like a good community (unlike something like TouchDB).
From what I can tell BigCouch mostly wraps itself around the core CouchDB code/APIs. This is good because it makes me think they started with CouchDB as the foundation and didn't try to do too much on top of it. For example, CouchDB's replication is already very good and BigCouch hasn't tried to re-invent the wheel. They just added some things that Couch was missing.
One downside to running BigCouch "raw" as opposed to with Cloudant is that Cloudant maintains their own internal fork that has more features. Our evaluation found that those features weren't needed though. They were a bit overkill for us.
Couchbase specifically seems to be a step behind. It took a long time to get to Couchbase 2.0 and I've been disappointed with Couchbase prior to 2.0. I hear 2.0 is great but haven't had a chance to use it yet. I've felt kind of burned with releases prior to 2.0 for various reasons.

score 1 · Answer 2 · answered Dec 18 '12 at 11:48

Not everyone needs the clustering. The CouchDB team is intent on merging BigCouch soon after the almost-ready 1.3 release, so starting to look into BigCouch would certainly make sense (and I would personally definitely pick BigCouch over CouchBase or couchdb-lounge -- many of the BigCouch contributors are CouchDB committers, anyway).

score 0 · Answer 3 · answered Dec 18 '12 at 14:51

0

The downside of clustering is the extra complexity of it. I would argue that unless you're already an experienced CouchDB user, using BigCouch from day 1 is perhaps a step too far.

As an alternative to learning how to set up and maintain a BigCouch deployment, you could go for an online CouchDB host like Cloudant and let them deal with the complexity of managing a cluster of machines. All you deal with is something which still looks like your local CouchDB instance.

Regarding storing files in CouchDB, why not store them in S3? (A lot cheaper than Cloudant btw)

answered Dec 18 '12 at 14:51

AndyD

5,252
35
32

I currently do store them in S3, but that adds a lot of additional complexity compared to, say, storing them in the same database as all of your other data. With S3, I have to create database records, uploads to S3, update records with URLs, sign URLs when they need to be publicly accessed, etc. Just a lot of additional work. I was just looking at alternatives. – Ryan Dec 18 '12 at 15:51
For what it's worth: we do a similar thing CouchDB + S3 except we don't make S3 public, we proxy it to allow us to change the storage later. Initially we used CouchDB attachments but they were going to be too expensive in Cloudant vs S3. There wasn't much complexity difference in our case. (multiple attachment documents linked to 1 master document). I would be interested in an alternative to both approaches. – AndyD Dec 18 '12 at 16:05
The reason that I currently expose S3 to end-users, for uploading and downloading, is so that the bandwidth is directed through whatever load balancing and stuff that S3 has setup. If I were to proxy it through an EC2 instance first, that'd probably just be creating a bottleneck there. Although, there is the benefit of being able to seamlessly change where your files are served from without the end-users knowing. – Ryan Dec 18 '12 at 19:29

For a clustered CouchDB setup, should I just go ahead and use BigCouch?

3 Answers3