I am working on a hive solution wherein I need to append some values to the high volume files. So instead of appending them, I am trying using map-reduce method The approach is below
Table creation:
create external table demo_project_data(data string) PARTITIONED BY (business_date string, src_sys_file_nm string, prd_typ_cd string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
LOCATION '/user/hive/warehouse/demo/project/data';
hadoop fs -mkdir -p /user/hive/warehouse/demo/project/data/business_date='20180707'/src_sys_file_nm='a_b_c_20180707_1.dat.gz'/prd_typ_cd='abcd'
echo "ALTER TABLE demo_project_data ADD IF NOT EXISTS PARTITION(business_date='20180707',src_sys_file_nm='a ch_ach_fotp_20180707_1.dat.gz',prd_typ_cd='ach')
LOCATION '/user/hive/warehouse/demo/project/data/business_date='20180707'/src_sys_file_nm='a_b_c_20180707_1.dat.gz'/prd_typ_cd='abcd';"|hive
hadoop fs -cp /apps/tdi/data/a_b_c_20180707_1.dat.gz /user/hive/warehouse/demo/project/data/business_date='20180707'/src_sys_file_nm='a_b_c_20180707_1.dat.gz'/prd_typ_cd='abcd'
echo "INSERT OVERWRITE DIRECTORY '/user/20180707' select *,'~karthick~kb~demo' from demo_project_data where src_sys_file_nm='a_b_c_20180707_1.dat.gz' and business_date='20180707' and prd_typ_cd='abcd';"|hive
I have some amount of data in the file but I dont see any results in the above query. The files are properly copied under the correct location. What is that I am making wrong? Query has no issues
Also I will be looping over multiple dates. I would like to know if this is the right way to do it.