I am trying to read file using spark reader. Spark reader splits the records in the file when it encounters the control characters like ^M
, ^H
, ^O
, ^P
.
To debug the issue I am trying to manually removing the control characters the file and testing record length with spark shell.
I tried to remove all control characters and check the record length:
sed -i 's/^[:print:]/ /g' <filename>
I found that it is also replacing punctuation characters like ? in space. Please suggest the command that will helpful to replace all control characters into space.