1

I receive a pipe-delimited text file from a user that populates an excel spreadsheet using screen scrapes, so the data is a mess. It is full of random ^M (carriage returns) and <96> (windows en dash) throughout which causes the import to be incomplete.

I have tried the dos2unix, and I receive an error that there was a problem with the conversion. I removed all the ^M by using this solution I found on this site:

tr -d '\r' < infile > outfile

The <96> characters remain. What would be the comparable '/r' for these dashes? Or perhaps there is a better solution? I would actually like to replace the "bad" dashes with "good" dashes if possible.

Inian
  • 80,270
  • 14
  • 142
  • 161
Morgan
  • 11
  • 1

2 Answers2

0

Why not just clean up the file using SAS instead? If your lines as shorter than 32,767 characters then it would be simple.

data _null_;
  infile 'input-file' termstr=LF ;
  file 'output-file' termstr=LF ;
  input;
  _infile_=translate(compress(_infile_,'0D'x),'-','96'x);
  put _infile_;
run;

If the lines are longer you can read the data field by field and fix it instead.

Tom
  • 47,574
  • 2
  • 16
  • 29
0

You can get the octal value using the command cat file.txt | od and remove it using tr just like you did with ^M characters.

Arun Kumar
  • 57
  • 1
  • 12