Does anyone have an idea of how to make a stratified sampling in pig? (wikipedia)
For the moment, I do something like :
relation2 = SAMPLE relation1 0.05;
but my dataset contains a label columns with a few occurrences, some of them are rare (0.5 % for example) and I would like my random down sampling not to forget all of them.
Thanks a lot.