Easiest - strings
Easiest way to do this is with the strings
command:
$ cat /tmp/asdf
in Arizona w/ fiancÃÂÃÂÃÂ
$ strings /tmp/asdf
in Arizona w/ fianc
The problems with this approach:
- It's not using sed
- It adds an end of line whenever it finds any non-printable character (it should be ok in your example, as they're all grouped at the end, but it will fail otherwise)
Ugliest - sed
's l
plus sed
post-processing
Now, if you must use sed
, then here's an alternative:
$ sed -n l /tmp/asdf | sed -E 's/\\[[:digit:]]{3}//g; s/\$$//'
in Arizona w/ fianc
Here, you're using l
to 'dump' non-printable characters, transforming them into octal representations like \303
, then removing anything that looks like an octal value so created, and then removing the $
that l
added at the end of the line.
It's kinda ugly, and may interact badly with your file, if it has anything which starts with a backslash followed by three digits, so I'd stay with the strings
option.
Better - sed
ranges with high Unicode characters
The one below is also a hack, but looks better than the rest. It uses sed
ranges, starting with '¡'. I picked that symbol because it is the second* character in the iso-8859-1 encoding, which also happens to be the Unicode section right after ASCII. So, I'm guessing that you're not having trouble with actual control codes, but instead of non-ASCII characters (anything represented over 127 Decimal).
For the second item in the range, just pick some non-latin character (Japanese, Chinese, Hebrew, Arabic, etc), hoping it will be high enough in Unicode that it includes any of your 'non-printing' characters.
Unfortunately, sed
does not have a [[:ascii:]]
range. Neither it accepts open-ended ranges, so you need this hack.
$ sed 's/[¡-ﺏ]/ /g' /tmp/asdf
in Arizona w/ fianc
(*) Note: I picked the second character in the range because the first character is a non-breaking space, so it would be hard to understand that it is not just a normal space.