0

I want to remove control characters (like ^C, ^A, and so on) from a standard input and print it to standard output, using just basic bash, perl and some other linux tools.

What I do right now is

(something) | sed 's/[[:cntrl:]]//g' | (something else)

Which worked until now, but now I found out it removes tabulators too and I want to keep those.

So, is there something else, just working?

Karel Bílek
  • 36,467
  • 31
  • 94
  • 149

3 Answers3

3

Modyfying second answer from Skip/remove non-ascii character with sed , I got this working sed script

sed 's/[^[:print:]\t]//'g

It seems to work (altough the "non-ascii" part is wrong, it does not remove any unicode).

For Unicode to work, you have to have the environment variables set up as LANG=en_US.UTF-8 and LC_CTYPE="en_US.UTF-8" (and exported).

Community
  • 1
  • 1
Karel Bílek
  • 36,467
  • 31
  • 94
  • 149
  • 1
    it certainly does remove non-printable unicode, but will only expect unicode input if your local environment variables are set appropriately (e.g. `LANG=en_US.UTF-8`) – ysth Apr 17 '13 at 02:11
1

You could just define the character class yourself based on the definition of [:cntrl:]:

sed 's/[\x00\-\x08\x10-\x1F\x7F]\{1,\}//g'
Borodin
  • 126,100
  • 9
  • 70
  • 144
Tim Pote
  • 27,191
  • 6
  • 63
  • 65
1

You can try ssed(super-sed) with perl-regex:

echo -e 'hello\tworld' | ssed 's/(?!\t)[[:cntrl:]]//g'
kev
  • 155,172
  • 47
  • 273
  • 272