I meet a problem when I read snappy file on HDFS
to mapreduce
.
I have set
job.setInputFormatClass(TextInputFormat.class);
in job.
Then I get the line value in mapper like this
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String[] strs = new String(value.getBytes()).split(String.valueOf(0x09));
LOGGER.info("strs length is " + strs.length);
}
If the data is correct, the strs length will be 44. But I found many error length (bigger than 44) in userlogs .
Then I print the value new String(value.getBytes)
.I found the string printed is not my expected.The line data is not the string I saved to HDFS.
The data's order is error,lead to the wrong value in mapper.
What I can do to solve the problem.
Thanks!