I am new to Apache Pig. I want to split and flatten the following input into my required output like who are all viewed that product.
My Input :(UserId, ProductId)
12345 123456,23456,987653
23456 23456,123456,234567
34567 234567,765678,987653
My Required Output:(ProductId, UserId)
123456 12345
123456 23456
23456 12345
23456 23456
987653 12345
987653 34567
234567 23456
234567 34567
765678 34567
My Pig Scripts:
a = load '/home/hadoopuser/ips' using PigStorage('\t') as (key:chararray, val:chararray);
b = foreach a generate key as ky1, FLATTEN(TOKENIZE(val)) as vl1;
c = group b by vl1;
d = foreach c generate group as vl2, $1 as ky2;
e = foreach d generate vl2, BagToString(ky2) as kyy;
f = foreach e generate vl2 as vl3,FLATTEN(STRSPLIT(kyy,'_')) as ky3;
g = foreach f generate vl3, FLATTEN(TOKENIZE(ky3)) as kk1;
dump g;
I got the following output which eliminates the repeated (duplicate)values,
(23456,12345)
(123456,12345)
(234567,23456)
(765678,34567)
(987653,12345)
I don't know how to solve this problem. Can anyone help me to solve this problem? and how to do this in a simple way?