In a Hive table I have millions of rows, I would like to do a partition on one column 'id' which will be unique. So it is not a good practice to create a partition on that unique column because it will create so many number of files, and directories, which could slow down process, So is there a way to specify: create a partition on this 'id' column for every 10k records or 30k records. So that performance can be improved? for example:
create table test(name string, note string) partitioned by(id int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE
LOCATION 'hdfs://somelocation/'
Also if there is a date type column is there can we do a partition on that column as with 'year and month only
'? like
PARTITIONED BY (year bigint, month bigint) or
year and month together?