I have an external table and partitioned on 3 columns and stored in hdfs:
/apps/var/db_name/table_name/date=1901/ref_src=src_a
/apps/var/db_name/table_name/date=1901/ref_src=src_b
/apps/var/db_name/table_name/date=1902/ref_src=src_a
/apps/var/db_name/table_name/date=1902/ref_src=src_b
/apps/var/db_name/table_name/date=1903/ref_src=src_a
/apps/var/db_name/table_name/date=1903/ref_src=src_b
/apps/var/db_name/table_name/date=1903/ref_src=src_c
Fields in table_name:
date|ref_src|col_a|col_b|
Now, based on some requirement i have to create new columns - col_c and col_d. So, I have planned some program to overwrite the calculated data to the same table. But, I have problem/issue triggered that what if there is any issue cause when my program is running so that the table data is corrupted or deleted or my program is having some issue?
So, my main question is how do I take table backup of hdfs data (mean table data) and the partition details. If I take complete directory as a back up will help or anything else I need to take care, my concern is mainly on prod data.