1

I have a couchdb instance with database a and database b. They should contain identical sets of documents, except that the _rev property will be different, which, AIUI, means I can't use replication.

How do I verify that the two databases really do contain the same documents which are all otherwise 'equal'?

I've tried using the python-based couchdb-dump tool with a lot of sed magic to get rid of the _rev and MD5 and ETag headers, but then it still seems that property order in the JSON structure is slightly random, which means I still can't compare the output easily with something like diff.

Is there a better approach here? Have other people wanted to solve a similar problem?

Gijs
  • 5,201
  • 1
  • 27
  • 42

1 Answers1

1

If you want to make sure they're exactly the same, write a map job that emits the document path as the key, and the documents hash (generated any way you like) as the value. Do not include the _rev field in the hash generation.

You cannot reduce to a single hash because order is not guaranteed, but you can feed the resultant JSON document to a good diff program.

Jacob Groundwater
  • 6,581
  • 1
  • 28
  • 42
  • Good point, that worked for me. I was too lazy to write something properly (I'm testing a tool's compatibility, so the actual DB contents or how fast it is is moot), but just concatenating document property key/values (and sorting the keys before doing that) worked. Thanks! – Gijs May 16 '12 at 10:59