1

I have written this following pig script. How can I make this a nested one?

input= LOAD '/path/to/input/data' USING PigStorage('\t') AS (id:chararray,category:chararray);

grp= GROUP input BY category;

grp_count= FOREACH grp generate group, COUNT(input);

grp_ordered= order grp_count by $1 DESC;

top_grp= LIMIT grp_ordered 5; 
Anthon
  • 69,918
  • 32
  • 186
  • 246
biswadeep
  • 11
  • 2

2 Answers2

1

It's pretty simple - take a look at the grp_count relation:

input= LOAD '/path/to/input/data' USING PigStorage('\t') AS (id:chararray,category:chararray);

grp_count= FOREACH (GROUP input BY category) 
           generate flatten(group) as category
           ,COUNT(input) as cnt;

grp_ordered= order grp_count by $1 DESC;

top_grp= LIMIT grp_ordered 5; 
Manko
  • 41
  • 4
0

If I understand your question correctly , below is the code .

data = LOAD 'data' USING PigStorage() AS (id,category);
grp = GROUP data BY category;

grp_count = FOREACH grp {

                  ord = order data by $1 DESC ;
                  top_grp = LIMIT ord 5;
           GENERATE flatten(group),COUNT(top_grp.$1) ; };

dump grp_count;