0

My hive table will have call record data. 3 columns of the table are field1- CALL_DATE, field2-FROM_PHONE_NUM, field3- TO_PHONE

I would query something like 1) i want to get all call records between particular dates. 2) I want to get all call records for a FROM_PHONE phone number between certain dates. 2) I want to get all call records for a TO_PHONE phone number between certain dates.

My table size is approximately 6TB.

Can i know How do i need to apply partitioning or bucketing for better performance of all of my queries?

AKC
  • 953
  • 4
  • 17
  • 46

1 Answers1

0

Your requirement is always to get data between certain dates and do filtering on it, so do table partition biased on date .

How to create Link for dynamic partition

You can have partition key date as yyyymmdd .

(like -- 20170406 for today(6th april 2017 ))

sandeep rawat
  • 4,797
  • 1
  • 18
  • 36
  • How do i make my query faster if i want to fetch from-date related call records. – AKC Apr 06 '17 at 13:41
  • if i understood you correct "u want to fetch call record for given date" then if data is partitioned by date as mentioned in answer hive do get data from repatriation and query will be fast ... – sandeep rawat Apr 07 '17 at 02:40