0

I have a file with contents like this:

9989
-23.
An integer: 23
Number 9090 is cool.
22.33 is not a valid integer
22a33
-111

I run the command grep -E '\b[-+]?[0-9]+\b' file.txt

This command looks for leading + or - and then keeps looking only for digits between 0-9. It has word boundary at start and end, this is to match the pattern as a whole word rather than a substring of some word.

It outputs the following:

9989
-23.
An integer: 23
Number 9090 is cool.
22.33 is not a valid integer
-111

It outputs 22.33 with 22 as one integer and 33 as the other integer, ignoring the '.' between. But the same does not happen with 22a33, it did not ignore character 'a' here.

I think the problem is that it includes the '.' as part of word boundary i.e. \b. So, on seeing 22.33, it considers 22 then word boundary and then 33. How should I resolve this error?

Also, I need to count the number of such matches at the end. How should I do that, should I use -o flag as grep -o to get all the matches and then run wc -l

Divyat
  • 151
  • 14
  • simple workaround would be to pipe the results to `grep -cv '[0-9]\.[0-9]'` .. may be there's a way to do it with single grep too.. especially if you have GNU grep with -P option which supports lookarounds – Sundeep Jan 21 '18 at 15:06
  • but I suppose there are other cases to consider too.. what if you have `-.23` should that be counted or ignored? – Sundeep Jan 21 '18 at 15:09
  • -.23 should be ignored as its not a integer – Divyat Jan 21 '18 at 15:30
  • grep -cv '[0-9]\.[0-9]' would lead to cases like 22a33 being consider, which is not desirable – Divyat Jan 21 '18 at 15:51
  • but your grep command would take care of 22a33... I think you missed the `pipe` part of my comment.. but you still need to modify the regex to take care of `-.23` – Sundeep Jan 21 '18 at 16:14
  • 1
    Thanks. I modified your command to grep -cv '[0-9]*?\.[0-9]+'. This would take care of cases like .23 and -.23 too – Divyat Jan 21 '18 at 16:55

0 Answers0