2

I need a unix command to verify the file has ASCII printable characters only (between ASCII Hex 20 and 7E inclusive).

I got below command to check if file contains non-ASCII characters, but cannot figure out my above question.

if LC_ALL=C grep -q '[^[:print:][:space:]]' file; then
    echo "file contains non-ascii characters"
else
    echo "file contains ascii characters only"
fi 
fedorqui
  • 275,237
  • 103
  • 548
  • 598
austin
  • 45
  • 1
  • 11

2 Answers2

3

nice to have: - Stop loading results. Sometimes one is enough

To find 20 to 7E characters in a file you can use:

grep -P "[\x20-\x7E]" file

Note the usage of -P to perform Perl regular expressions.

But in this case you want to check if the file just contains these kind of characters. So the best thing to do is to check if there is any of them that are not within this range, that is check [^range]:

grep -P "[^\x20-\x7E]" file

All together, I would say:

grep -qP "[^\x20-\x7E]" file && echo "weird ASCII" || echo "clean one"
fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • Thank you for responding but I got this as a result $ grep -qP "[^\x20-\x7E]" au26026.txt && echo "weird ASCII" || echo "clean one" grep: illegal option -- P usage: grep [-E|-F] [-c|-l|-q] [-bhinsvwx] -e pattern_list... [-f pattern_file...] [file...] usage: grep [-E|-F] [-c|-l|-q] [-bhinsvwx] [-e pattern_list...] -f pattern_file... [file...] usage: grep [-E|-F] [-c|-l|-q] [-bhinsvwx] pattern [file...] clean one – austin Aug 10 '15 at 15:04
  • This is because you are using an old `grep`. What OS are your working on? What do you get if you say `grep --version`? – fedorqui Aug 10 '15 at 15:05
  • I am using putty, release 0.60. This is what I have to use at my work, no other option to access the server files. The result I get from `grep --version` is `$ grep --version grep: illegal option -- - usage: grep [-E|-F] [-c|-l|-q] [-bhinsvwx] -e pattern_list... [-f pattern_file...] [file...] usage: grep [-E|-F] [-c|-l|-q] [-bhinsvwx] [-e pattern_list...] -f pattern_file... [file...] usage: grep [-E|-F] [-c|-l|-q] [-bhinsvwx] pattern [file...]` – austin Aug 10 '15 at 15:11
  • @austin and what do you see if you say `man grep`? It looks like you are using something strange. – fedorqui Aug 10 '15 at 15:12
  • `NAME grep, egrep, fgrep - search a file for a pattern SYNOPSIS Plain call with pattern grep [-E|-F] [-c|-l|-q] [-bhinsvwx] pattern [file ...] Call with (multiple) -e pattern grep [-E|-F] [-c|-l|-q] [-bhinsvwx] -e pattern... [-e pattern] ... [file ...] Call with -f file grep [-E|-F] [-c|-l|-q] [-bhinsvwx] [-f pattern_file] [file ...] Obsolescent: egrep [-cefilnsv] [expression] [file ...] fgrep [-cefilnsvx] [strings] [file ...]` followed by description of the grep command. – austin Aug 10 '15 at 15:30
  • @austin it is still unclear what version you are running. Without that I cannot suggest alternatives. – fedorqui Aug 10 '15 at 15:41
  • Actually, this is tagged "unix", not "linux", and there is no standard "-P" option for grep in unix. – Thomas Dickey Aug 10 '15 at 16:53
  • Thank you for the suggestions, I finally got it to work for me. – austin Aug 10 '15 at 17:34
  • @austin mmm could you indicate how you make it work? You can even post an answer with your solution, since it can help other people. – fedorqui Aug 10 '15 at 17:49
  • My solution was pretty simple. `cat au26026.txt | tr -d '\t' | grep '[^\x20-\x7E]'` Although, I tried using color in grep but couldn't. Maybe it's because I am using putty, release 0.6. Putty uses unix commands but often rejects few commands that I get to work in Ubuntu. – austin Aug 10 '15 at 18:06
  • @austin ah nice! So probably my solution works if you remove the `-P` from it. I thought it needed -P to work. – fedorqui Aug 10 '15 at 18:42
  • @austin get rid of the UUOC: `cat au26026.txt | tr -d '\t'` should be `tr -d '\t' < au26026.txt`. – Ed Morton Aug 10 '15 at 18:56
  • @austin I wonder how `grep '[^\x20-\x7E]'` works to you without the `-P`. To me it shows a `grep: Invalid range end` error. – fedorqui Aug 11 '15 at 09:36
  • @fedorqui I use putty 0.6 which is really weird because I cannot use BACKSPACE key in putty terminal if I made a mistake, I have to use CTRL+C to disregard the command and retype in new prompt. It has to do with server I connect to because the commands in putty terminal (when I connect to a different server) works fine. – austin Aug 13 '15 at 18:24
  • @edmorton Thank you for 'tr -d '\t' < au26026.txt' suggestion, but I always use data pipeline because of the data file testing I do at my work. – austin Aug 13 '15 at 18:26
  • 1
    @austin wrt using control-C, etc. - `man stty` and look for how to set `erase` and `intr` and put that in your `.profile` or equivalent file, e.g. `stty echoe erase '^h' intr '^?'` where `^h` and `^?` are literal Backspace (control-h) and Delete (control-c) keypress chars which you can enter after a control-v in `vi`. – Ed Morton Aug 13 '15 at 18:40
  • 1
    @fedorqui - FWIW that range works fine for me with GNU grep 2.21 on cygwin. – Ed Morton Aug 13 '15 at 18:45
0

This can be done in unix using the POSIX grep options:

if LC_ALL=C grep -q '[^ -~]' file; then
    echo "file contains non-ascii characters"
else
    echo "file contains ascii characters only"
fi

where the characters in [ ... ] are ^ (caret), space, - (ASCII minus sign), ~ (tilde).

You could also specify ASCII tab. The standard refers to these as collating elements. It seems that both \x (hexadecimal) or \0 (octal) are shown in the standard description of bracket expressions (see 7.4.1). So you could use \x09 or \011 for the literal tab.

According to the description, by default -e accepts a basic regular expression (BRE). If you added a -E, you could have an extended regular expression (but that is not needed).

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105