While researching column-oriented DB, I read "the primary key is the data" many times. (e.g., at Column-oriented DBMS)
I thought I can randomly access to any cell (in a certain column) by value because values, the data, are already indexed as primary key.
But after I put more than 3M rows into HBase, the HBase shell command
scan 'lottery', {COLUMNS => 'cf:status', FILTER => "ValueFilter(=, 'binary:win')"}
takes more than 3 seconds...
(It's getting slower and slower as more and more rows are put...)
'win'
or 'lose'
are two possible values for the column cf:status
and there is only 1 row whose value is 'win'
.
I might misunderstood...
What does "the primary key is the data" mean in column-oriented DB?
Thank you.