0

I have a table which contain sample CDR data in that column A and column B having calling person and called person mobile number I need to find whose having maximum number of calls made(column A) and also need to find to which number(column B) called most

the table structure is like below

calling called

  • 889578226 77382596
  • 889582256 77382596
  • 889582256 7736368296
  • 7785978214 782987522

in the above table 889578226 have most number of outgoing calls and 77382596 is most called number in such a way need to get the output

in hive i run like below

SELECT calling_a,called_b, COUNT(called_b) FROM cdr_data GROUP BY calling_a,called_b;

what might be the equalent code for the above query in pig?

Community
  • 1
  • 1
Anas A
  • 199
  • 4
  • 19

1 Answers1

0

Anas, Could you please let me know this is what you are expecting or something different?

input.txt
a,100
a,101
a,101
a,101
a,103
b,200
b,201
b,201
c,300
c,300
c,301
d,400

PigScript:
A = LOAD 'input.txt' USINg PigStorage(',') AS (name:chararray,phone:long);
B = GROUP A BY (name,phone);
C = FOREACH B GENERATE FLATTEN(group),COUNT(A) AS cnt;
D = GROUP C BY $0;
E = FOREACH D {
                SortedList = ORDER C BY cnt DESC;
                top = LIMIT SortedList 1;
                GENERATE FLATTEN(top);
              }
DUMP E;

Output:
(a,101,3)
(b,201,2)
(c,300,2)
(d,400,1)
Sivasakthi Jayaraman
  • 4,724
  • 3
  • 17
  • 27
  • @Anas, Could you please validate the above solution?. FYI..I don't know who put downvote for your question. – Sivasakthi Jayaraman Oct 20 '14 at 07:09
  • 1
    the most irritating thing that one can do is downvote your question or answer and do not leave any comment for doing so. I would expect the downvoting guy to leave a comment so the person asking the question can learn something. Anyways, its hidden who downvoted your question, so can not do anything about it! – Gaurav Phapale Oct 21 '14 at 23:34