It seems that the map.input.start property isn't giving me the position of the start of a line (except, of course, the first map.input.start which is 0). Sometimes, map.input.start is somewhere in the middle of the first line of the mapper's input, sometimes it's somewhere in the middle of the last line of the previous mapper's input. Is this to be expected? If so, how can I get byte offsets of lines? using TextInputFormat
doesn't work, because I'm using Hadoop streaming, which discards the key to the mapper
Asked
Active
Viewed 148 times
1

Vyassa Baratham
- 1,457
- 12
- 18
-
What version of hadoop are you using? – Chris White Jul 11 '12 at 15:27
-
0.20.5 you say - http://hadoop.apache.org/common/releases.html, do you mean 0.20.205.0? – Chris White Jul 11 '12 at 17:06