Hadoop MapReduce - Join of two files and Computation on grouped values

Question

I am fairly new to Hadoop and MapReduce programming. I want to know whether it is possible to group by another value (not key) after joining of two files.

I have two files which have following data

File1

name    marks
A       Male
B       Male
C       Female

File2

name    marks
A       25
B       28
A       30
C       22

Now is there any method to find the percentage of marks for each gender. I am trying to get the following as output

Male    percentage_of_marks_of_male_students
Female  percentage_of_marks_of_female_students

Is there anyway to do this in a single job. I've tried using two jobs for this, but couldn't find any headway.

Any tips would be appreciated.

Edit:

After joining the files I get something like this

{name1 - ["gender","marks1","marks2",...]}
{name2 - ["gender","marks1","marks2",...]}
{name3 - ["gender","marks1","marks2",...]}
...

I'm currently stuck at finding sum of marks of male and females separately in the reducer phase

Edit:

I have solved the problem. I used two jobs. First job joins two files, gives output as

[gender, the sum of marks of each student]

I sent the output file as input to second job which gives percentage of marks by gender.

Can you give us an example of what the result of joining the two files would look like, since you say that you want to make this job _after_ the join has been made? — Coursal, Jan 21 '21 at 11:42
After joining I'm expecting to get in this format {name1 - ["gender","marks1","marks2",...]} {name2 - ["gender","marks1","marks2",...]} {name3 - ["gender","marks1","marks2",...]} .... — Mark, Jan 22 '21 at 08:16
What do you mean by _percentage_ of marks? Like the sum of them, or the sum of them by the total number of marks? — Coursal, Jan 22 '21 at 12:12
I meant it as sum of them by total marks. I have solved the problem. I used a counter to sum up the total marks and in the second job I found the percentage. Thank you for taking time to help me. — Mark, Jan 24 '21 at 09:19

Hadoop MapReduce - Join of two files and Computation on grouped values

0 Answers0