1

This below program I am trying to do it in Apache Pig as it is and unstructured data

i) I have dataset which contains street name, city and state:

ii) Group by state

iii) I am taking COUNT(*) of states in the dataset Now my o/p will be like statename,count===>how may time that state is available in the dataset

program:

realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);

A = GROUP realestate by state;
B= FOREACH A GENERATE group , count (*)

O/P will be like

CA,14
washington,20

now I need max of (count) my output should be " washington,20)

how to proceed it .please help me to resolve the issue

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
sivaraj
  • 49
  • 1
  • 5

1 Answers1

1

Apply ORDER and LIMIT on the generated result

realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);
A = GROUP realestate by state;
B = FOREACH A GENERATE group , COUNT(realestate) as c;

# Arrange the tuples based on the count in descending order
D = order B by c desc;

# Apply limit on the ordered result to get the Max value
E = LIMIT D 1;
franklinsijo
  • 17,784
  • 4
  • 45
  • 63