I'm trying to use ^A as the separator between Key and Value in my reduce output files. I found that the config setting "mapred.textoutputformat.separator" is what I want and this correctly switches the separator to ",":
conf.set("mapred.textoutputformat.separator", ",");
But it can't handle the ^A character:
conf.set("mapred.textoutputformat.separator", "\u0001");
throws this error:
ERROR security.UserGroupInformation: PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 94; Character reference "&#
I found this ticket https://issues.apache.org/jira/browse/HADOOP-7542 and see they tried to fix this but reverted the patch due to XML1.1 concerns.
SO I'm wondering if anyone has had success setting the separator to ^A (seems pretty common), using an easy work around. Or if I should just settle and use tab separator.
Thanks!
I'm running Hadoop 0.20.2-cdh3u5 on CentOS 6.2