4

I am troubleshooting how to sort my data multiple times without have to go back through the mapper each time.

Id like to setup: mapper 1 --> reducer 1 ---> reducer 2 ---> reducer 3

I want to make reducer 1 output (key, data) and then have it go straight to reducer 2...is this possible?

I have learned from troubleshooting that you can chain jobs, but this requires a mapper for each step?

Whenever I try to run without a mapper it ends with an error. It seems like running mapper for each step would be a waste of time/resources if I can just output it as needed from reducer 1.

Thoughts?

user1179295
  • 706
  • 3
  • 10
  • 21

1 Answers1

1

In short, if you are using Java, ChainReducer and ChainMapper are what you need. With these classes you can add arbitrary number of reducers or mappers in a chain in any order.

The book "Hadoop in Action" describes this procedure in chapter 5.

vpap
  • 1,389
  • 2
  • 21
  • 32