Regex "^[[:digit:]]$" not working as expected in AWK/GAWK

Question

My GAWK version on RHEL is:

gawk-3.1.5-15.el5

I wanted to print a line if the first field of it has all digits (no special characters, even space to be considered)

Example:

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^[[:digit:]]$/)  print $0}'

Output:
Nothing

Expected Output:
123456789012345,3

What is going wrong here ? Does my AWK version not understand the GNU character classes ? Kindly help

Inian · Accepted Answer · 2016-12-23T09:11:18.380

To match multiple digits in the the [[:digit:]] character class add a +, which means match one or more number of digits in $1.

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]+)$/)  print $0}'
123456789012345,3

which satisfies your requirement.

A more idiomatic way ( as suggested from the comments) would be to drop the print and involve the direct match on the line and print it,

echo "123456789012345,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
123456789012345,3

Some more examples which demonstrate the same,

echo "a1,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'

(and)

echo "aa,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'

do NOT produce any output a per the requirement.

Another POSIX compliant way to do strict length checking of digits can be achieved with something like below, where {3} denotes the match length.

echo "123,3" |  awk --posix -F, '$1 ~ /^[0-9]{3}$/'
123,3

(and)

echo "12,3" |  awk --posix -F, '$1 ~ /^[0-9]{3}$/'

does not produce any output.

If you are using a relatively newer version of bash shell, it supports a native regEx operator with the ~ using POSIX character classes as above, something like

#!/bin/bash

while IFS=',' read -r row1 row2
do
   [[ $row1 =~ ^([[:digit:]]+)$ ]] && printf "%s,%s\n" "$row1" "$row2"
done < file

For an input file say file

$ cat file
122,12
a1,22
aa,12

The script produces,

$ bash script.sh
122,12

Although this works, bash regEx can be slower a relatively straight-forward way using string manipulation would be something like

while IFS=',' read -r row1 row2
do
   [[ -z "${row1//[0-9]/}" ]] && printf "%s,%s\n" "$row1" "$row2"
done < file

The "${row1//[0-9]/}" strips all the digits from the row and the condition becomes true only if there are no other characters left in the variable.

For some reason experts encourage to use more idiomatic `awk` as in `echo "123456789012345,3" | awk -F, '$1 ~ /^[[:digit:]]*$/'`. Isn't that `print` redundant here? — sjsam, Dec 23 '16 at 08:26
@sjsam : Of course yes! Sometimes when using the OP's own command and modifying on top of it, some tiny details are lost. Nice catch though! Feel free to edit it, being your valid point! — Inian, Dec 23 '16 at 08:31
I may not as you have already covered what is wrong with op's regex match. By the way it would have been more interesting if the op has an input like `echo ",3" | awk -F, '$1 ~ /^[[:digit:]]*$/'` — sjsam, Dec 23 '16 at 08:39
As OP is using version 3.1.5 of Gnu awk you should probably add `--posix` switch to the quantifier (`{n,m}`) examples. — James Brown, Dec 23 '16 at 08:59
@All : Thanks Guys. I found what I was missing. `echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]*)$/) print $0}'` from @Ravinder or the `echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]+)$/) print $0}'` from @Inian does the trick. — dig_123, Dec 23 '16 at 10:18

score 3 · Answer 2 · answered Dec 23 '16 at 08:45

3

Here you are printing every line that matches a pattern. This is exactly the purpose of grep. Since @Inian brilliantly told you what was wrong with your code, let me propose an alternative grep-based answer that does exactly the same as the awk command (albeit much faster):

grep -E '^[[:digit:]]+,'

answered Dec 23 '16 at 08:45

xhienne

5,738
1
15
34

@ xhienne : The actual file on which I need to process will be very big. That's the reason of using the awk. I had just taken out a code line from my otherwise complete script, to avoid unnecessary confusion, and to be very clear on what I'm missing, and what I expect. – dig_123 Dec 23 '16 at 10:20
@dig_123 If your file is very big, this is the very reason to prefer `grep` to `awk` which may be more than 100x slower. Unless you need advanced functionalities in `awk`? Maybe you can put your `awk` at the output of `grep`, you would see some real speed-up I guess. – xhienne Dec 23 '16 at 11:11

score 2 · Answer 3 · answered Dec 23 '16 at 07:42

Could you please try following and let me know if this helps.

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]*)$/)  print $0}'

EDIT: Above code could be reduced a bit to as follows too.

echo "123456789012345,3" | awk -F, '($1 ~ /^[[:digit:]]*$/)'

Regex "^[[:digit:]]$" not working as expected in AWK/GAWK

3 Answers3