3

in a relation X with only 1 row

X.A=null
X.B= "blahblah"

now I want to do:

Y = FILTER X BY A != B ;

I intended to say that since A is null and B is not, the condition should be true. but the actual result is that Y is empty, and the condition evaluates to false.

this is the same as SQL, where any condition involving null is false. while SQL can translate null with the NVL() function , PIG does not seem to have that , is there a clean way to do the above comparison?

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
teddy teddy
  • 3,025
  • 6
  • 31
  • 48

2 Answers2

2

It's a bit of a hack, but you can do a comparison and store the result in your relation, then filter using the result and your original comparison.

Y = foreach X generate *,
        (((A is null and B is not null) or (A is not null and B is null) ? 'y' : 'n') as oneNull;
Y = filter Y by (oneNull == 'y' or A != B);
Y = foreach Y generate A, B;

Using this, if X is:

(,blahblah)
(blah,blah)
(,)
(blah,)
(abc,def)

Then Y will be:

(,blahblah)
(blah,)
(abc,def)
cyang
  • 5,574
  • 2
  • 25
  • 34
0

I have Pig version 0.12.1 installed; it actually works for me to write the condition directly:

Y = filter X by (A is null and B is not null) or (A is not null and B is null) or A != B;

Note that for the null/null case, the expression A != B evaluates to null, and so the filter excludes the tuple, according to

http://pig.apache.org/docs/r0.12.1/basic.html#nulls.

stefan.schroedl
  • 866
  • 9
  • 19