Lets say I have a table with the following columns (A,B,C)
How would I write a pig statement to create a group by on a column( A). And then filter where count(column B > 100) and count of (distinct(column C) > 3) ?
From what I have:
I first removed count where B is less than 100
filter_column = FILTER data by b > 100;
Then did a group by on A:
group_1 = GROUP filter_column by A;
How would I now filter this group_1 where the count of distinct values in column C is > 3 ?