I would like to ask you if the current schema design on a HBase table is correct for the following scenario: I receive 10 million events per day each having a unix epoch timestamp and an id. I will have to group by day, so that I can easily scan for those events that happened on a specific day.
Current design: Events timestamp is converted to a format "MM-YYYY_DD" string as key and each id of an event that occurred on that day is stored in the row. This will result in up to 10 million columns in one row. As far as I understand HBase there is a lock on writing on a single row. Resulting in having many locks when importing a single day and decreasing performance.
Maybe this would be a better design?: Use the unix epoch timestamp as a row's key resulting in many rows with several thousand columns (several events may occurring on the same second, because my timestamp has a max. resolution of one second). When scanning one can calculate the start and end time in unix epoch and do the scan.