My hive table will have call record data. 3 columns of the table are field1- CALL_DATE, field2-FROM_PHONE_NUM, field3- TO_PHONE
I would query something like 1) i want to get all call records between particular dates. 2) I want to get all call records for a FROM_PHONE phone number between certain dates. 2) I want to get all call records for a TO_PHONE phone number between certain dates.
My table size is approximately 6TB.
Can i know How do i need to apply partitioning or bucketing for better performance of all of my queries?