1

I have multiple files in a directory. I want to extract each line in all the files containing which has integer value greater than 45.

Currently, I am using :

grep "IO resumed after" *

Its displaying me all the files which this string "IO resumed after" I want to put one more parameter that will grep all the lines "IO resumed after [number >45] seconds"

Bodo
  • 9,287
  • 1
  • 13
  • 29
Deadpool
  • 13
  • 1
  • 3
  • 2
    Welcome to SO, could you please post sample of your Input_file and expected output in your question in CODE TAGS for more clarity. – RavinderSingh13 Aug 07 '20 at 09:08
  • 1
    Do (or can) the numbers have a decimal point? – Bodo Aug 07 '20 at 10:28
  • You accepted my answer, so it seems to be sufficient for you. Anyway you should respond to comments and make your question more clear. After re-reading the question I noticed "integer value greater than 45". Can you confirm that the numbers don't contain a decimal point? Additional question: Can the numbers have leading zeros, e.g. 0045 instead of 45? – Bodo Aug 07 '20 at 12:34

3 Answers3

6

It is better to use awk for this:

awk 'match($0,"IO resumed after") { if (substr($0,RSTART+RLENGTH)+0 > 45) print }' file

This searches for the string "IO resumed after", if that string is found it will take everything after this string and convert it to a number: if the substring after "IO resumed after" starts with a number, then it will be converted to that number when we just add zero to it.

This will only work if the line looks like:

xxxxIO resumed after_nnnnyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

where x and y are random characters, underscore is any sequence of blanks, n is a digit.

You can test it with the following set of commands:

$ seq 40 0.5 50 | awk '{print "foo IO resumed after",$0,"random stuff"}' \
  | awk 'match($0,"IO resumed after") { if (substr($0,RSTART+RLENGTH)+0 > 45) print }'

which outputs:

foo IO resumed after 45.5 random stuff
foo IO resumed after 46.0 random stuff
foo IO resumed after 46.5 random stuff
foo IO resumed after 47.0 random stuff
foo IO resumed after 47.5 random stuff
foo IO resumed after 48.0 random stuff
foo IO resumed after 48.5 random stuff
foo IO resumed after 49.0 random stuff
foo IO resumed after 49.5 random stuff
foo IO resumed after 50.0 random stuff
kvantour
  • 25,269
  • 4
  • 47
  • 72
0

You can use alternatives and repeat counts to define a search pattern for numbers greater than 45.

This solution assumes the numbers are integer numbers without a decimal point.

grep 'IO resumed after \(4[6-9]\|[5-9][0-9]\|[0-9]\{3,\}\) seconds'

or shorter with egrep:

egrep 'IO resumed after (4[6-9]|[5-9][0-9]|[0-9]{3,}) seconds'

I tested the pattern with

for i in 1 10 30 44 45 46 47 48 49 50 51 60 99 100 1234567
do
echo "foo IO resumed after $i seconds bar"
done | grep 'IO resumed after \(4[6-9]\|[5-9][0-9]\|[0-9]\{3,\}\) seconds'

which prints

foo IO resumed after 46 seconds bar
foo IO resumed after 47 seconds bar
foo IO resumed after 48 seconds bar
foo IO resumed after 49 seconds bar
foo IO resumed after 50 seconds bar
foo IO resumed after 51 seconds bar
foo IO resumed after 60 seconds bar
foo IO resumed after 99 seconds bar
foo IO resumed after 100 seconds bar
foo IO resumed after 1234567 seconds bar

If the numbers (can) have a decimal point, it is difficult to define a pattern for numbers > 45, e.g. 45.1.
This pattern allows a decimal point or comma followed by digits and implements a condition >= 46.

grep 'IO resumed after \(4[6-9]\|[5-9][0-9]\|[0-9]\{3,\}\)\([.,][0-9]*\)\{,1\} seconds'

2nd edit:

The patterns above don't handle possible leading zeros. As suggested by user kvantour in a comment, the pattern can be extended to handle this. Furthermore, if it is not required to check the seconds part, the pattern for the decimals can be omitted.

Pattern for numbers >= 45 with optional leading zeros:

grep 'IO resumed after 0*\(4[5-9]\|[5-9][0-9]\|[1-9][0-9]\{2,\}\)'
Bodo
  • 9,287
  • 1
  • 13
  • 29
  • @MarkSetchell If we change the question from bigger than 45, into bigger than or equal too 45, the decimal point is irrelevant if the string `" seconds"` is removed from the grep. – kvantour Aug 07 '20 at 12:03
  • Note that the regex `[0-9]\{3,\}` will also match `000` and all numbers below 45. So it is better to enforce the first number to be bigger than 0 and replace this by. `[1-9][0-9]\{2,\}` . Also if you add any sequence of zeros to the the string, you become bulletproof: `0*\(4[5-9]\|[5-9][0-9]\|[1-9][0-9]\{2,\}\)` – kvantour Aug 07 '20 at 12:08
0

Looks like i need to learn awk until then i've got a bash solution. If seconds without decimal point then this:

while read line; do
    number=${line//*after}
    number=${number//seconds*}
    ((number>45)) && echo $line
done <<< $(grep "IO resumed after" *)

otherwise we have to use bc:

while read line; do
    number=${line//*after}
    number=${number//seconds*}
    case $(bc <<< "$number>45") in 1) echo "$line";; esac
done <<< $(grep "IO resumed after" *)
Ivan
  • 6,188
  • 1
  • 16
  • 23