1

I want to replace non-ASCII characters or specific ASCII characters with a space in a file using shell scripting, sed or Perl.

First is to replace all non-ASCII characters with space in file. That I know we can do using below command

perl -pi -e 's/[[:^ascii:]]/ /g'

There are certain ASCII characters as well which downstream cannot accept, so I would want to replace those characters with a space. For example, ASCII character with value 0x19 (EM - end of medium) is not accepted by downstream and I want to replace it with space.

Also I know a range of ASCII characters with which downstream has a problem and I would want to replace each of them with a space.

Can I get help to accomplish this?

Note: Perl version in our system is 5.8.4. I want to do this exercise on Solaris 10 machine.

Thanks

Borodin
  • 126,100
  • 9
  • 70
  • 144
Chkusi
  • 139
  • 1
  • 5
  • 15
  • Your question seems to be more complex than you have described. Please post something more relevant instead of engaging in chat to define your problem – Borodin Feb 13 '13 at 22:41
  • @Borodin: I have given proper description for my issue. I did not understand why you felt like that and down-voted. I have explained issue clearly. Issue I am facing is to get proper solution approach to cover all character range. – Chkusi Feb 13 '13 at 22:48

1 Answers1

3

You can just add them into the character class in your regex. For example, to remove non-ASCII characters, plus \031 and (say) characters in the range a-e, you would write:

perl -pi -e 's/[[:^ascii:]\031a-e]/ /g'

Edited to add:

For your new requirement:

I have to replace Non ASCII characters with DEC 128 and above with the exception of DEC 145 – 148 and DEC 150-151 with space.

You can write:

perl -pi -e 's/[^[:ascii:]\x91-\x94\x96\x97]/ /g; s/\031/ /g;'

(Note the change from [:^ascii:] "non-ASCII characters" to [:ascii:] "ASCII characters", and the change from [...] "any of the characters ..." to [^...] "any character other than ...".)

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • Thanks. Suppose if i have to replace ASCII from DEC 128 and above with the exception of DEC 145 – 148 and DEC 150-151 with space. Then how Can we do it ? – Chkusi Feb 13 '13 at 18:30
  • 1
    @Chkusi: Wait, that doesn't make sense. To Perl, `[:ascii:]` means characters in the range 0-127. "ASCII from DEC 128 and above" is inherently contradictory. – ruakh Feb 13 '13 at 18:56
  • @FtLie: No, that's octal. (And you missed the "and DEC 150-151" part.) – ruakh Feb 13 '13 at 18:57
  • @ruakh Sorry i should not have called it as ASCII. It is that I have to replace Non ASCII characters with DEC 128 and above with the exception of DEC 145 – 148 and DEC 150-151 with space. – Chkusi Feb 13 '13 at 19:08
  • @ruakh: In Perl statement i would need to use Hex values correct ? Is there a "]" missing in second perl statement above ? – Chkusi Feb 13 '13 at 21:11
  • @Chkusi: Re: needing to use hex values: Correct. Perl only support hex and octal notations, not decimal, and even for octal, there are problems above `\077`. Re: there being a `]` missing: No, I don't think so. The command works for me. What makes you say that? – ruakh Feb 13 '13 at 21:23
  • @ruakh: Sorry its my bad.. Just overlooked. Nothing is missing. You mean to say Hex is the ideal one to use it with perl for replacing correct ? Also I am facing issue in replacing few Non ASCII characters with Space. For Ex: Character with DEC 195.. Below is link for ASCII table [link]http://www.asciitable.com/ – Chkusi Feb 13 '13 at 21:34
  • @Chkusi: Sorry, I don't understand your question. Yes, you should use hex. And the command that I posted will replace decimal 195 just fine. – ruakh Feb 13 '13 at 21:41
  • @ruakh: Decimal 195 is non printable and non ascii character. You can see it in the link [link]http://www.asciitable.com/. Our space replacing condition is 's/[^[:ascii:]\x91-\x94\x96\x97]/ /g'. So apart from characters mentioned in condition everything else should get replaced by space. Decimal 195 is one character which should have been replaced by space which is not happening. Similar thing is happening for character which looks like double !. – Chkusi Feb 13 '13 at 22:08
  • @Chkusi: Sorry, but you must be doing something wrong. I've tested my code, and it *does* replace decimal 195 with a space. Why do you think that it doesn't? – ruakh Feb 13 '13 at 22:12
  • @ruakh: This is getting a long way from being useful to the community. Your *"For your new requirement"* should refer to the original question, but seems to be a response to a comment somewhere? You should encourage the OP to post a new question instead of letting this become a public chat session – Borodin Feb 13 '13 at 22:37
  • @Borodin: I don't think it's supposed to be a new question, I think that the OP is just having difficulty expressing his/her requirements properly. And I don't think this was an appropriate use of a downvote. (It's a major stretch to claim that the answer is not useful simply because it includes an additional section addressing further comments.) But whatever. \*shrug\* – ruakh Feb 13 '13 at 22:41
  • @ruakh: I dont think i am doing something wrong because i am using same above command given by you. However when I open the file which has Decimal 195 character in winscp ssh client, i am able to see actual characters properly as shown in link [link]http://www.asciitable.com/. But when I open the same file in vi editor or cat the file in Putty SSH client, I am not able to see those characters properly. Also I executed od -c which gives octal dump. But I am not getting correct octal value as per ASCII table [link]http://www.asciitable.com/ – Chkusi Feb 13 '13 at 22:45
  • @Chkusi: Octal 031 has nothing to do with decimal 195. It's a completely different character. I'll edit my answer to replace it as well, but you really need to do a better job clarifying your questions. – ruakh Feb 13 '13 at 23:14
  • @ruakh: Main issue here is in winscp ssh client i see a different character which is proper and in PuTTY ssh client i see a different junk character which is not proper. I am executing od -c from PuTTY ssh client and getting an octal value 031. – Chkusi Feb 13 '13 at 23:19
  • @Chkusi: If your "main issue" is that WinSCP and PuTTY are showing you two different things, then how come your question says nothing about that? – ruakh Feb 13 '13 at 23:25
  • @ruakh: I was testing it using approach given by you. During that time I noticed this happening. WinSCP and PuTTy issue was not main issue when we started and was not noticed. But now we are facing that issue. Thats the reason why i told its the main issue now. Anyways thanks alot for all your help. Appreciate that. – Chkusi Feb 13 '13 at 23:44
  • @Chkusi: You're welcome. I hope you figure out your other issues. – ruakh Feb 13 '13 at 23:55
  • This comment thread has gotten quite long, please move extended discussions to [chat]. – Tim Post Feb 14 '13 at 11:29