7

If we have ip=192.168.0.1 and we call split(ip, myArray, "."), myArray will contains "192" at position 1, "168" at position 2, "0" at position 3 and "1" at position 4.

My question is that why does awk not interpreted the "." as the "any character" regular expression?

What would I need to do if I want to make awk interpreted the "." as the "any character" regular expression for matching?

Will this behaviour be consistent across all awk implementations?

2 Answers2

20

This is really a dark corner of awk....

I had the same doubt about 5 years ago. I submitted as bug and talked to a developer of gawk, and finally got clear. It is a "feature".

Here is the ticket: https://lists.gnu.org/archive/html/bug-gawk/2013-03/msg00009.html

split(str, array, magic)

For magic:

  • when you use a non-empty string (quoted by "") "...", awk will check the length of the string, if it is single char, it will be used as literal string (they call it separator). However if it is longer than 1, it will be treated as a dynamic regex.

  • when you use static regex, which means, in format /.../, no matter how long is the expression, it will be always treated as regex.

That is:

"."  - literal "." (period)
"["  - literal "["
"{"  - literal "{"
".*" - regex
/./  - regex
/whatever/ -regex

If you want awk to treat .(period) as regex metacharacter, you should use split(foo,bar,/./) But if you split by any char, you may have empty arrays, if this is what you really want.

Kent
  • 189,393
  • 32
  • 233
  • 301
  • I actually wanted to split on literal "." (period) but was just wondering why it works (and awk didnt treat it as regex). Your answer fully explains my confusion. Do you know if what you explained apply to all awk (like mawk, POSIX awk, etc.) or just gawk? (I see you only mentioned gwak) EDIT: Looked in the linked ticket and found out that other awks work this way also, not just gawk. :) – Maytas Monsereenusorn Apr 07 '17 at 13:51
  • All true. Having said that - sInce the 3rd arg to split is a regexp you should use regexp, not string, delimiters for it, and within a regexp the way to specify a literal `.` is to put it inside a bracket expression so if you just write the code correctly as `split(str,arr,/[.]/)` then this question never even comes up. – Ed Morton Apr 07 '17 at 15:32
2

You should use /./ to distinguish regex from static string to use each char as a delimiter:

$ echo 192.168.0.1 | awk '{ split($0,a,/./); print a[1] }'
$               # nothing here, every char is a delimiter.
James Brown
  • 36,089
  • 7
  • 43
  • 59