-1

I need to find out whether a field in a pipe delimited file is a numeric or not. I need to report if the field is not a numeric field, ignore if it is numeric and also ignore if it is null. I have other computations.

I wrote this code:

gawk -v w_column_pos="$column_pos" -F "|" '
$w_column_pos !~ /^([+-]|[0-9])[0-9]*(.[0-9]*)$|^([+-]|[0-9])[0-9]*$|^$/ { print $w_column_pos," is not Numeric"; } ' $src_data_file

w_column_pos="$column_pos" is used for getting the column number

The problem is, it does not report error for 202D, 203B etc; it accepts one alpha character.

But it does report error for 202DD.

I previously had /^([+-]|[0-9])[0-9]*(.[0-9]*)?$|^$/; this also had the same issue.

Sample input file
Name|Designation|Is Employee| Organisation ID|Hire Date
Alex Conolly|Prof1|TrUE|100|12072015
Thomas |Prof2|TRUE|200B|09072016
Christine prof1|FALSE||24902007
John Martini|PPP|TRUE~FALSE|202|11782099
xxYY |PPP|TRUE|91.67|11782099
ABD S | XXX | FALSEx | 209|11093000

I am asking about 4th column: Organisation id which is a number type

My code works fine, but 200B(in the 3rd row) is not reported
DPR
  • 25
  • 6

1 Answers1

0

Change the pattern to:

/^([+-]|[0-9])[0-9]*([.][0-9]*)$|^([+-]|[0-9])[0-9]*$|^$/

The unprotected "." was matching the "B".

joast
  • 3,048
  • 2
  • 24
  • 16