-2

I have below file tax_cal I want to load in pig:

101,5|2;3|2

102,3|1;4.5|2;4|1

103,2|1;5|2;5.6|3

output:

101,5|2,3|2

102,3|1,4.5|2,4|1

103,2|1,5|2,5.6|3

Further, I will pass this output file to a python UDF to calculate totalprice.

How can I accomplish this?

Community
  • 1
  • 1
Harshit Kakkar
  • 117
  • 2
  • 12

1 Answers1

1

So the basic load command for pig is as below, but I am not sure with your file sample data type. Try to look at this below and check if you can modify it the way you needed.

    A = LOAD '(your_file_name)' USING PigStorage(',') AS (bill_number:INT, tax:chararray); 
Chetan_Vasudevan
  • 2,414
  • 1
  • 13
  • 34
  • No doing this will only give me value after '|' , i got below output:(,2;3) (,2;4) (,2;2.5) i don't know how to separate this file :( – Harshit Kakkar Aug 28 '17 at 20:20
  • Well I dont understand your text file, please post first 5 lines of your text file – Chetan_Vasudevan Aug 28 '17 at 20:31
  • Hi, this is the file which i want to load to explain it first column is bill number :int and second column is array having tax_details. for example 5|2 where 5 id price and 2 is tax. i have to load this file in pig and then pass it through UDF to calculate total price. – Harshit Kakkar Aug 28 '17 at 20:53
  • so is it separated by tab's or commas – Chetan_Vasudevan Aug 28 '17 at 21:29
  • Assuming its comma separated I will be editing the above answer of mine, for bill number and tax – Chetan_Vasudevan Aug 28 '17 at 21:43
  • it is seperated by comma's – Harshit Kakkar Aug 30 '17 at 18:24
  • no by doing PigStorage(',') it will not solve the issue , since it will stop when it see first comma and truncate other so output will be :101,5|2 102,3|1 103,2|1 – Harshit Kakkar Aug 30 '17 at 18:27
  • I understand that you need 101 as bill number and 5 is tax, then again 102 as tax and 3 as bill number ? Is this right ? – Chetan_Vasudevan Aug 30 '17 at 18:35
  • Hi, I did below steps and it worked but facing different problem: A = load 'tax_cal' using PigStorage(' ') as (bill_no:int,tax_details:chararray); B = FOREACH A GENERATE bill_no,FLATTEN(TOKENIZE(tax_details,',')) AS letter; REGISTER 'tax_calulator.py' USING jython as newudf C = FOREACH B GENERATE bill_no, letter , newudf.price(letter); D = group C by (bill_no,totalprice); E = foreach D generate FLATTEN(group) as (bill_no,totalprice), SUM(C.totalprice) as sum1; – Harshit Kakkar Aug 30 '17 at 22:51
  • on dumping E it give: (101,9.0,9.0) (101,16.5,16.5) (102,8.0,8.0) (102,10.0,10.0) (102,10.5,10.5) (103,7.5,7.5) (103,18.0,18.0) but i wanted sum for each bill_no like 101, 25.5.... can you help me in this please? – Harshit Kakkar Aug 30 '17 at 22:57
  • while applying foreach just do the regular sum method over bill_no – Chetan_Vasudevan Aug 30 '17 at 23:05