0

I need to do sort of mapping from a string to an integer id, I was considering to do a UDF function and pass this string throw it. For this to work I need to have a single mapper.

How do I block the mappers to 1?

Thanks for the help

oleber
  • 1,089
  • 4
  • 12
  • 25
  • Can you describe your problem in more details? I've encountered a number of cases when the number of reducers needed to be restricted, but I don't see why one would care about the number of mappers. – Olaf Aug 29 '12 at 15:12

1 Answers1

0

I understand what you're trying to do, but your UDF-based approach won't scale very well because that string-to-id table is going to have to reside in memory. You might have an easier time of it by using a map-reduce job to pass the strings from the mapper to a single reducer. The reducer instance just keeps a incrementing counter that it uses to associate all of the strings passed in to the reduce method (all the same strings) to the next integer value of the counter.

Maybe someone else knows how to limit the input format to producing a simple split (to get a single mapper).

Chris Gerken
  • 16,221
  • 6
  • 44
  • 59