3

I have a BigCouch cluster with Q=256, N=3, R=2, W=2. Everything seems to be up and running and I can read and write small test documents. The application is in Python and uses the CouchDB library. The cluster has 3 nodes, each on CentOS on vxware with 3 cores and 6GB RAM each. BigCouch 0.4.0, CouchDB 1.1.1, Erlang R14B04, Linux version CentOS Linux release 6.0 (Final) on EC2 and CentOS release 6.2 (Final) on vmware 5.0.

Starting the application attempts to do a bulk insert with 412 documents and a total of 490KB data. This works fine with N=1 so there isn't an issue with the contents. But when N=3 I seem to randomly get one of these results:

  • write completes in about 9 sec
  • write completes in about 24 sec (nothing in between)
  • write fails after about 30sec (some documents were inserted)
  • Erlang crashes after about 30sec (some documents were inserted)

vmstat shows near 100% CPU utilization, top shows this is mostly the Erlang process, truss shows this is mostly spent in "futex" calls. Disk usage jumps up and down during the operation, but CPU remains pegged.

The logs show lovely messages like:

"could not load validation funs {{badmatch, {error, timeout}}, [{couch_db, '-load_validation_funs/1-fun-1-', 1}]}"

"Error in process <0.13489.10> on node 'bigcouch-test02@bigcouch-test02.oceanobservatories.org' with exit value: {{badmatch,{error,timeout}},[{couch_db,'-load_validation_funs/1-fun-1-',1}]}"

And of course there are Erlang dumps.

From reading about other people's use of BigCouch, this certainly isn't a large update. Our VMs seem beefy enough for the job. I can reproduce with cURL and a JSON file, so it isn't the application. (Can post that too if it helps.)

Can anyone explain why 9 cores and 18GB RAM can't handle a (3x) 490KB write?

more info in case it helps:

can reproduce with commands: download above JSON entries as file.json

url=http://yourhost:5984
curl -X PUT $url/test
curl -X POST $url/test/_bulk_docs -d @file.json -H "Content-Type: application/json"
Community
  • 1
  • 1
user634943
  • 101
  • 1
  • 4

1 Answers1

0

Got a suggestion that Q=256 may be the issue and found that BigCouch does slow down a lot as Q grows. This is a surprise to me -- I would think the hashing and delegation would be pretty lightweight. But perhaps it dedicates too many resources to each DB shard.

As Q grows from too small to do allow any real cluster growth to maybe big enough for BigData, the time to do my 490kb update grows from uncomfortably slow to unreasonably slow and finally into the realm of BigCouch crashes. Here is the time to insert as Q varies with N=3, R=W=2, 3-nodes as originally described:

Q      sec
4      6.4
8      7.7
16    10.8
32    16.9
64    37.0  <-- specific suggestion in adam@cloudant's webcast

This seems like an achilles heel for BigCouch: in spite of suggestions to overshard to support later growth of your cluster, you can't have enough shards unless you already have a moderate-sized cluster or some powerful hardware!

user634943
  • 101
  • 1
  • 4