0

I am fairly new to Hadoop and MapReduce programming. I want to know whether it is possible to group by another value (not key) after joining of two files.

I have two files which have following data

File1

name    marks
A       Male
B       Male
C       Female

File2

name    marks
A       25
B       28
A       30
C       22

Now is there any method to find the percentage of marks for each gender. I am trying to get the following as output

Male    percentage_of_marks_of_male_students
Female  percentage_of_marks_of_female_students

Is there anyway to do this in a single job. I've tried using two jobs for this, but couldn't find any headway.

Any tips would be appreciated.

Edit:

After joining the files I get something like this

{name1 - ["gender","marks1","marks2",...]}
{name2 - ["gender","marks1","marks2",...]}
{name3 - ["gender","marks1","marks2",...]}
...

I'm currently stuck at finding sum of marks of male and females separately in the reducer phase

Edit:

I have solved the problem. I used two jobs. First job joins two files, gives output as

[gender, the sum of marks of each student]

I sent the output file as input to second job which gives percentage of marks by gender.

Mark
  • 1
  • 1
  • Can you give us an example of what the result of joining the two files would look like, since you say that you want to make this job _after_ the join has been made? – Coursal Jan 21 '21 at 11:42
  • After joining I'm expecting to get in this format {name1 - ["gender","marks1","marks2",...]} {name2 - ["gender","marks1","marks2",...]} {name3 - ["gender","marks1","marks2",...]} .... – Mark Jan 22 '21 at 08:16
  • What do you mean by _percentage_ of marks? Like the sum of them, or the sum of them by the total number of marks? – Coursal Jan 22 '21 at 12:12
  • I meant it as sum of them by total marks. I have solved the problem. I used a counter to sum up the total marks and in the second job I found the percentage. Thank you for taking time to help me. – Mark Jan 24 '21 at 09:19

0 Answers0