1

i am facing a lot of difficulties trying to load certain directories and process them.

the idea is i want to process all unprocessed files. in order to do so, i store my process timestamp inside hdfs everytime i finished processing. that way it'll be much easier to determine whether the files are processed or not (by measuring last processing timestamp and current timestamp).

here's my script:

--process latest
register hdfs:/udf/myudf.jar
define toDate tech.main.tics.convertDate();
define startTS tech.main.tics.startTS();
define endTS tech.main.tics.endTS();


raw = LOAD 'hdfs:/home/raw/report/last_process_time/part-r-00000' AS DATE;
start_ts = foreach raw generate startTS(DATE); 
end_ts = FOREACH raw GENERATE endTS(ToUnixTime(CurrentTime()));

store start_ts into /home/raw/report/start-ts
store end_ts into /home/raw/report/end-ts

run -param START=/home/raw/report/start-ts/part-m-00000 -param END=/home/raw/report/end-ts/part-r-00000 hdfs:/home/raw/pig-script/update_test.pig

and here's my update_test.pig

register 'hdfs:/udf/elephant-bird-pig-4.10.jar';
register 'hdfs:/udf/elephant-bird-core-4.10.jar';
register 'hdfs:/udf/elephant-bird-hadoop-compat-4.10.jar';
register 'hdfs:/udf/json-simple-1.1.1.jar';
register hdfs:/udf/myudf.jar
define toDate tech.main.tics.convertDate();
define toBag tech.main.tics.MapToBag();

last_processed = LOAD 'hdfs:/home/raw/report/last_process_time/part-r-00000' AS (DATE);
previous1 = LOAD 'hdfs:/home/raw/report/events_by_application/part-r-00000';

raw = LOAD '/home/raw/dummy-logs/{$START..$END}/*' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
scene = foreach raw generate
        (float)json#'value' AS VALUE,
        (long)json#'ts' AS TS,
        toDate(json#'ts') AS DATE;

store scene into 'hdfs:/home/raw/report2/total-scene';

--temporarily disabled
--rmf /home/raw/report/
--fs -mv /home/raw/report2/. /home/raw/report
--rmf /home/raw/report2

PIG kept reading my substituted parameter as path instead of its content.

i wonder what have i done wrong?

thanks

kenlz
  • 461
  • 7
  • 22

0 Answers0