1

I have a file as:

1,Mary,5
1,Tom,5
2,Bill,5
2,Sue,4
2,Theo,5
3,Mary,5
3,Cindy,5
4,Andrew,4
4,Katie,4
4,Scott,5
5,Jeff,3
5,Sara,4
5,Ryan,5
6,Bob,5
6,Autumn,4
7,Betty,5
7,Janet,5
7,Scott,5
8,Andrew,4
8,Katie,4
8,Scott,5
9,Mary,5
9,Tom,5
10,Bill,5
10,Sue,4
10,Theo,5
11,Mary,5
11,Cindy,5
12,Andrew,4
12,Katie,4
12,Scott,5
13,Jeff,3
13,Sara,4
13,Ryan,5
14,Bob,5
14,Autumn,4
15,Betty,5
15,Janet,5
15,Scott,5
16,Andrew,4
16,Katie,4
16,Scott,5 

I want the answer with names most appeared i.e max (Scott,6)

user234202
  • 162
  • 7

1 Answers1

1

There's some ambiguity in your question.

What exactly do you want.

Do you want a list of user count in descending order?

OR

Do you want just (scott,6) i.e. only one user with maximum count?

I have successfully solved both the things,on the sample data which you gave.

If the question is of first type then,

a = load '/file.txt' using PigStorage(',') as (id:int,name:chararray,number:int);
g = group a by name;
g1 = foreach g{ 
      generate group as g , COUNT(a) as cnt;
}; 
toptemp  = group g1 all; 
final = foreach toptemp{
        sorted = order g1 by cnt desc;
        GENERATE flatten(sorted);
};

This will give you a list of users in descending order as,

(Scott,6)
(Katie,4)
(Andrew,4)
(Mary,4)
(Bob,2)
(Sue,2)
(Tom,2)
(Bill,2)
(Jeff,2)
(Ryan,2)
(Sara,2)
(Theo,2)
(Betty,2)
(Cindy,2)
(Janet,2)
(Autumn,2)

If the question is of second type then,

a = load '/file.txt' using PigStorage(',') as (id:int,name:chararray,number:int);
g = group a by name;
g1 = foreach g{ 
      generate group as g , COUNT(a) as cnt;
}; 
toptemp  = group g1 all; 
final = foreach toptemp{
        sorted = order g1 by cnt desc;
        top = limit sorted 1;     
        GENERATE flatten(top);
};

This gives us only one result ,

(Scott,6)

Thanks.I Hope it helps.

ashubhargave
  • 230
  • 2
  • 14
  • Hi ashu,i wanted to show only (scott,6) but i wanted to show using a MAX function. can u plz help here to get the result by the same – user234202 Jan 20 '14 at 13:50
  • I don't think MAX can help us here.Why don't you try the second solution.Did you test the second solution?It works successfully.Can you tell me how can this be achieved using MAX? – ashubhargave Jan 21 '14 at 07:44