I'm trying to understand below query by using that how data is going to be placed.
CREATE TABLE mytable (
name string,
city string,
employee_id int )
PARTITIONED BY (year STRING, month STRING, day STRING)
CLUSTERED BY (employee_id) INTO 256 BUCKETS
The keyword PARTITIONED BY
will distribute the data in below like dir structure.
/user/hive/warehouse/mytable/y=2015/m=12/d=02
But am not able to understand, how employee_id
will be distributed among these directories ? 256 buckets (files) will be created, and all those files will be having all employee_id
but which file will sit under which dir, how that will be decided ?
Can anyone help me to understand this ?