I need to find out whether a field in a pipe delimited file is a numeric or not. I need to report if the field is not a numeric field, ignore if it is numeric and also ignore if it is null. I have other computations.
I wrote this code:
gawk -v w_column_pos="$column_pos" -F "|" '
$w_column_pos !~ /^([+-]|[0-9])[0-9]*(.[0-9]*)$|^([+-]|[0-9])[0-9]*$|^$/ { print $w_column_pos," is not Numeric"; } ' $src_data_file
w_column_pos="$column_pos"
is used for getting the column number
The problem is, it does not report error for 202D
, 203B
etc; it accepts one alpha character.
But it does report error for 202DD
.
I previously had /^([+-]|[0-9])[0-9]*(.[0-9]*)?$|^$/
; this also had the same issue.
Sample input file
Name|Designation|Is Employee| Organisation ID|Hire Date
Alex Conolly|Prof1|TrUE|100|12072015
Thomas |Prof2|TRUE|200B|09072016
Christine prof1|FALSE||24902007
John Martini|PPP|TRUE~FALSE|202|11782099
xxYY |PPP|TRUE|91.67|11782099
ABD S | XXX | FALSEx | 209|11093000
I am asking about 4th column: Organisation id which is a number type
My code works fine, but 200B(in the 3rd row) is not reported