I need to do sort of mapping from a string to an integer id, I was considering to do a UDF function and pass this string throw it. For this to work I need to have a single mapper.
How do I block the mappers to 1?
Thanks for the help
I need to do sort of mapping from a string to an integer id, I was considering to do a UDF function and pass this string throw it. For this to work I need to have a single mapper.
How do I block the mappers to 1?
Thanks for the help
I understand what you're trying to do, but your UDF-based approach won't scale very well because that string-to-id table is going to have to reside in memory. You might have an easier time of it by using a map-reduce job to pass the strings from the mapper to a single reducer. The reducer instance just keeps a incrementing counter that it uses to associate all of the strings passed in to the reduce method (all the same strings) to the next integer value of the counter.
Maybe someone else knows how to limit the input format to producing a simple split (to get a single mapper).