I am trying to find a good way to organize my row-keys to perform range scans on them without creating my own index lists.
I am having a MySQL Database with currently about 15.000 Databases, each ~50 Tables = 75.000 Tables. Because 99% of the data is always read with an unique identifier that data is planned to move into a Cassandra cluster.
For some maintenance (listing the contents of a complete table, removing a complete table or dropping a database) cases I need to get the contents of a complete table or even a database. Range-Scans seem to be the perfect fit for that.
Currently I am planning to generate UUIDs for each part of the old structure and put them together separated by a |
(DB + Table + Id = UUID1|UUID2|UUID2).
Example:
07424eaa-4761-11e1-ac67-12313c033ac4|0619a6ec-4525-11e1-906e-12313c033ac4|0619a6ec-4795-12e9-906e-78313c033ac4
The CF with the data should be sorted with org.apache.cassandra.db.marshal.AsciiType
.
As client I am using phpcassa.
For the range scans I want to use an UUID|
as start key and as an end for the range, the same key but with chr(255)
or z
appended to it. The ascii-value for both characters are bigger any other of the UUID characters that are following in that keys.
Is this a solid approach that allows me to achieve the explained goals for the range scans?