Like Ryan mentioned, the access patterns to the data has a lot to do with it. Since Ryan covered the MongoDB side (which I know little about), I'll give the Hbase side of things a shot.
For starters I suggest you read the BigTable paper, since Hbase was heavily influenced by it's design. This video also has some good details on Hbase's design elements. Also if your more interested in Zookeeper try reading the Chubby Paper too.
Things to consider for Hbase:
Indexing rows: The way rows are "indexed" in Hbase (or Cassandra using the Ordered Partitioner) is it's blessing and curse. I believe, mongoDb uses a B+Tree (correct me if I'm wrong), where Hbase just stores rows in-order. This approach is good for map-reduce jobs and sequential reads. For map-reduce jobs it means data-locality to the region servers that run the jobs. It helps sequential reads by allowing the disk controllers to read sequential sectors on disk while doing a "scan" of keys. The curse is that the data is stored in order... So if you don't design your rows well, you end up having "hot" nodes. For example, if you simplely used a timestamp as a row-key, you could end up with one node taking all the writes and your other nodes sitting idle. So, designing your row-keys in Hbase is very important. This video on OpenTSDB goes into some good details about how they use HBase.
Another advantage of columnar databases are the they can use column compression instead of row compression. Normally, the entropy of a column is much lower than that of a row. So it makes compression more effective. For example, if your columns are storing UserAgents, URLs, Keywords,... they will compress really well.
Example Hbase solution:
Lets say you wanted to create a solution for tracking visitor data on your ecommerce site, with a requirement to support aggregates over any date range. Because Hbase stores keys in a sequential manner on disk, if your keys are designed well Hbase may give you better performance creating realtime sequential scans.
For this example lets assume we store lots of metrics about visitors with the following key
schema (
{product-category}.{sub-category}.{metric}.{timestamp-rounded-to-the-minute}
). For example: a single page visit may write to the following keys:
shoes.running.search-terms.1362818100,
shoes.running.user-agents.1362818100,
shoes.running.visitors-country.1362818100,... SideNote: all these keys are basically sequential and would most likely be written to a single region server and you may want
these writes to be distributed to more than one machine. One solution
would be to replace the {product-category}.{sub-category} part of the
key with a HashOf( {product-category}.{sub-category} ). Or to use a key lookup like OpenTSDB does.
So with this key design it becomes fast to run ad-hoc/realtime query of these metrics. For example, to query all the search-terms used between 1331666259 (Tue, 13 Mar 2012) to 1334344659 (Fri, 13 Apr 2012), you would issue a scan for (shoes.running.search-terms.1331666259 to shoes.running.search-terms.1334344659)
EDITs: I fixed a couple of typos