0

Often see people are using group by and join for the same problem, suppose I have a student table and score table, want to find student name with related course score. It seems we can resolve this problem by either using join, or using group by? Wondering pros and cons for the two solutions. Post data structure and code below. Thanks.

table students:

student ID, student name, student email address

score table:

student ID, course ID, score

student_scores = group students by (studentId) inner, scores by (studentId);

student_scores = join students by student Id, scores by studentId;
Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • 2
    Possible duplicate of [Join vs COGROUP in PIG](http://stackoverflow.com/questions/7496029/join-vs-cogroup-in-pig) – rahulbmv Mar 14 '16 at 08:47
  • @rahulbmv, nice reference, and vote up. :) But I am asking group v.s. join, you are referring co-group? Thanks. – Lin Ma Mar 15 '16 at 02:47
  • @rahulbmv, also I am confused by what means "the foreign key" in the comments -- "Both need to send all of the records forward with the key being the foreign key.", if you could show an example, it will be great. – Lin Ma Mar 15 '16 at 02:51

1 Answers1

1

In the Pig Latin Manuall about Join it says:

Note the following about the GROUP/COGROUP and JOIN operators:

The GROUP and JOIN operators perform similar functions. GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples.
The GROUP/COGROUP and JOIN operators handle null values differently (see Nulls and JOIN Operator).

Not sure if it pros & cons , but they are diffrent

Mzf
  • 5,210
  • 2
  • 24
  • 37
  • Thanks Mzf, my question is specifically how they are different in my sample. Want to learn the differences. :) – Lin Ma Mar 15 '16 at 18:52