I am trying to pre split hbase table. One the HbaseAdmin java api is to create an hbase table is function of startkey, endkey and number of regions. Here's the java api that I use from HbaseAdmin void createTable(HTableDescriptor desc, byte[] startKey, byte[] endKey, int numRegions)
Is there any recommendation on choosing startkey and endkey based on dataset?
My approach is lets say we have 100 records in dataset. I want data divided approximately in 10 regions so each will have approx 10 records. so to find startkey i will say scan '/mytable', {LIMIT => 10}
and pick the last rowkey as my startkey and then scan '/mytable', {LIMIT => 90}
and pick the last rowkey as my endkey.
Does this approach to find startkey and rowkey looks ok or is there better practice?
EDIT I tried following approaches to pre split empty table. ALl three didn't work the way I used it. I think I will need to salt the key to get equal distribution.
PS> I am displaying only some region info
1)
byte[][] splits = new RegionSplitter.HexStringSplit().split(10);
hBaseAdmin.createTable(tabledescriptor, splits);
This gives regions with boundaries like:
{
"startkey":"-INFINITY",
"endkey":"11111111",
"numberofrows":3628951,
},
{
"startkey":"11111111",
"endkey":"22222222",
},
{
"startkey":"22222222",
"endkey":"33333333",
},
{
"startkey":"33333333",
"endkey":"44444444",
},
{
"startkey":"88888888",
"endkey":"99999999",
},
{
"startkey":"99999999",
"endkey":"aaaaaaaa",
},
{
"startkey":"aaaaaaaa",
"endkey":"bbbbbbbb",
},
{
"startkey":"eeeeeeee",
"endkey":"INFINITY",
}
This is useless as my rowkeys are of composite form like 'deptId|month|roleId|regionId'
and doesn't fit into above boundaries.
2)
byte[][] splits = new RegionSplitter.UniformSplit().split(10);
hBaseAdmin.createTable(tabledescriptor, splits)
This has same issue:
{
"startkey":"-INFINITY",
"endkey":"\\x19\\x99\\x99\\x99\\x99\\x99\\x99\\x99",
}
{
"startkey":"\\x19\\x99\\x99\\x99\\x99\\x99\\x99\\
"endkey":"33333332",
}
{
"startkey":"33333332",
"endkey":"L\\xCC\\xCC\\xCC\\xCC\\xCC\\xCC\\xCB",
}
{
"startkey":"\\xE6ffffffa",
"endkey":"INFINITY",
}
3) I tried supplying start key and end key and got following useless regions.
hBaseAdmin.createTable(tabledescriptor, Bytes.toBytes("04120|200808|805|1999"),
Bytes.toBytes("01253|201501|805|1999"), 10);
{
"startkey":"-INFINITY",
"endkey":"04120|200808|805|1999",
}
{
"startkey":"04120|200808|805|1999",
"endkey":"000PTP\\xDC200W\\xD07\\x9C805|1999",
}
{
"startkey":"000PTP\\xDC200W\\xD07\\x9C805|1999",
"endkey":"000ptq<200wp6\\xBC805|1999",
}
{
"startkey":"001\\x11\\x15\\x13\\x1C201\\x15\\x902\\x5C805|1999",
"endkey":"01253|201501|805|1999",
}
{
"startkey":"01253|201501|805|1999",
"endkey":"INFINITY",
}