This question is similar to one asked two years ago, however for some reason, it does not work for me. Actually this is a combination of two ideas (answered questions) as given in the header. The example below replicates the accepted solution however it does not work for me: What is my mistake? I give a complete self contained working example:
Here is the data: cat in_detail.csv
grp,val
1,2.1,
1,4.2,
1,6.3
2,6.5
2,1.2
2,4.3
2,3.2
cat in_cnt.csv
grp,cnt
1,2
2,3
output expected (sort order not important):
grp,val
1,2.1,
1,4.2,
2,6.5
2,1.2
2,4.3
Here is the code: with the error message
detail1 = LOAD '/tmp/sD_mvmd/c0nelha/data/in_detail.csv' using PigStorage(',') as (grp:chararray,num:double);
cnt1 = LOAD '/tmp/sD_mvmd/c0nelha/data/in_cnt.csv' using
PigStorage(',') as (grp:chararray,cnt:int);
d_group = GROUP detail1 by (grp);
describe d_group;
--d_group: {group: chararray,detail1: {(grp: chararray,num: double)}}
describe cnt1;
--cnt1: {grp: chararray,cnt: int}
detail2 = JOIN d_group by (group), cnt1 by (grp);
describe detail2;
--detail2: {d_group::group: chararray,d_group::detail1: {(grp: chararray,num: double)},cnt1::grp: chararray,cnt1::cnt: int}
detail3 = FOREACH detail2 {
mySelection = LIMIT d_group::detail1 detail2.cnt1::cnt;
GENERATE mySelection;
}
-- Apache Pig version 0.12.1.2.1.5.0-695 (rexported) compiled Aug 27 2014, 23:56:19
-- Backend error : Scalar has more than one row in the output.
-- 1st : (1,{(1,6.3),(1,4.2),(1,2.1)},1,2), 2nd :(2,{(2,3.2),(2,4.3),(2,1.2),(2,6.5)},2,3)
dump detail3;