8

How can I check in , if a bag contains an element?

Example : In a bag of chararray, how can I check if a token is present?

admdrew
  • 3,790
  • 4
  • 27
  • 39
Nitish Upreti
  • 6,312
  • 9
  • 50
  • 92

1 Answers1

5

In Apache Pig you can use statements nested in FOREACH see Pig Basics. Here is example from the documentation: A is a bag in B.

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE COUNT (S.$0);
}

Instead of COUNT you can use IsEmpty and ?: operator

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE (IsEmpty(S.$0)) ? 'xyz NOT PRESENT' : 'xyz PRESENT') as present, B;
}

Or only to leave the bags that contain the data:

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE B, S;
}
F = FILTER X BY not IsEmpty(S);
R = FOREACH F GENERATE B;

This will avoid costly join to itself, as extra joins are extra Map Reduce jobs.

disco crazy
  • 31,313
  • 12
  • 80
  • 83
alexeipab
  • 3,609
  • 14
  • 16
  • In PIG 0.15 you can't project B from nested expression. This did not work for me: X = FOREACH B { S = FILTER A BY 'xyz'; GENERATE B, S; } – hellraiser Oct 14 '19 at 03:49