2

I set the property mapred.textoutputformat.separator with value \001. But when I run the MR Job, it's throwing exception:

Character reference "&#1" is an invalid XML character.

Please help me.

Alex Hadley
  • 2,125
  • 2
  • 28
  • 50
Sneha Parameswaran
  • 185
  • 1
  • 4
  • 9

1 Answers1

1

I got the solution. The reason was that when using "\001" character sequence or other Unicode characters, during the object serialization it was getting transformed to some invalid formats.

So the solution was to encode the character using Base64, override the getRecordWriter method of TextOutputFormat class and then decode it there.(Base64.decodeBase64)

This will work.

Sneha Parameswaran
  • 185
  • 1
  • 4
  • 9