0

I'm trying to have awk print only a specific information. I can make it when it comes only to simple text strings. But its not working when I ask to search and print something like:

/[0-9]*\.[0-9]*\.[0-9]*\.[0-99]*\.[0-999]*/

I'm looking for numbers separated by dots, almost like IP address. For example:

#.#.#.##.### where #=integer

For example:

This prints only TEXT and works fine.

awk '{for(i=1;i<=NF;i++){ if($i==“TEXT”){print $i} } }' source.txt > result.txt

This should print what I need, but doesn't work.

awk '{for(i=1;i<=NF;i++){ if($i==“/[0-9]*\.[0-9]*\.[0-9]*\.[0-99]*\.[0-999]*/”){print $i} } }' source.txt > result.txt

This works fine but prints the whole line, and not only what I need:

awk -F"\t" '/[0-9]*\.[0-9]*\.[0-9]*\.[0-99]*\.[0-999]*/{ print }' source.txt > result.txt

What am I doing wrong?

double-beep
  • 5,031
  • 17
  • 33
  • 41
ebvogt
  • 43
  • 10
  • `==` compares two strings, it doesn't do a regexp match. For that you want `~`. And you also want to remove the quotes around the regexp. Like this: `awk 'BEGIN { if ("a.c" ~ /a\.*/) print "yes"; else print "no" }'`. – jas Dec 07 '15 at 18:43
  • Thanks anubhava. Did it and worked just fine. – ebvogt Dec 07 '15 at 18:53

2 Answers2

1

Was using "==" instead of "~" and unnecessary quotes.

This is working fine:

awk '{for(i=1;i<=NF;i++){ if($i~/[0-9]*\.[0-9]*\.[0-9]*\.[0-99]*\.[0-999]*/){print $i} } }' jfinancas.txt > teste5.txt
ebvogt
  • 43
  • 10
1

why are you using regexp like [0-9]* - I mean why '*' ? meaning you are looking for any number (including zero) of digits ? [0-9] is enough. If you want multiple repetitions but finite, just repeat : /[0-9].[0-9].[0-9].[0-9][0-9].[0-9][0-9][0-9]/ for #.#.#.##.###

awk '{for(i=1;i<=NF;i++) \
{ if($i ~ /[0-9]\.[0-9]\.[0-9]\.[0-9][0-9]\.[0-9][0-9][0-9]/){print $i} } }' jfinancas.txt > teste5.txt

Using gawk (gnu awk), you can manage repetitions within regexp [0-9]{3} will match 3 digits exactly.

Pierre G.
  • 4,346
  • 1
  • 12
  • 25
  • To be honest, my knowledge is very limited and im using '*' because it was included in a code i found to be helpful for me. If i dont need to use it, fine, lets remove it and make the code more clear. Im looking for numbers formatted as described in the initial question. Thus, the first 3 numbers can be anything from 1 to 9; the 4 number can be anything from 01 to 99 and the 5 number anything from 001 to 999. Ex.: 2.4.9.76.090 – ebvogt Dec 08 '15 at 01:05
  • no issue ;) - but [0-99] does not mean 'match any number between 0 & 99", [0-999] means match any character between '0' and '9', the [] defines a set of characters, and the - means an interval between 2 characters. – Pierre G. Dec 08 '15 at 04:10
  • If i have more than 1 instance of that information in the same line, is there a way to tell the command to print only 1 instance of that information for every line? – ebvogt Dec 15 '15 at 16:59
  • Your loop go through each & every field and check if this field complies with your pattern. So you just have to break your look at first match. – Pierre G. Dec 15 '15 at 17:03
  • 1
    Alright, so in this case could i also specify the code to go through only a certain field in each line? Because this info would be only in a certain field of the line, not all of them. – ebvogt Dec 15 '15 at 17:09
  • correct, change {print $i} to {print $i; break} and your loop will stops; awk will then process the next line. – Pierre G. Dec 15 '15 at 17:11