2

mawk is not POSIX compliant because it does not support POSIX EREs.

To be precise, it does not support named character classes like [[:space:]] within its EREs, which are part of POSIX EREs.

Both GNU awk and BusyBox awk do not seem to have this problem.

I encountered this issue multiple times in my own awk scripts, because I really like [[:space:]] for matching htabs as well as spaces and potentially other locale-specific whitespace with a single character class expression.

So I wonder why several Linux distros chose to ship a non-POSIX-compliant implementation of such a prominent utility by default, even though POSIX-compliant ones are also available?

1 Answers1

3

Looking at http://archive.debian.org, it seems that:

  • mawk appeared around 1997 as 1.3.3
  • busybox appeared around 2002 as 0.60.2
  • busybox finally reached version 1 (1.1.3) in 2006

I would imagine that mawk is still the default for one main reason:

  1. Inertia. It's been packaged as the default for a long time.

Note that mawk is POSIX compliant (in a way). From its manpage:

mawk conforms to the Posix 1003.2 (draft 11.3) definition of the AWK language

Unfortunately that's not the version you care about...

Given how hard it is even to get its version updated:

(both still open, the latter since 2009!!), imagine how hard it would be to get debian to replace it with something else entirely!

I suspect there is also:

  1. it's really easy to install gawk (or your preferred implementation)
jhnc
  • 11,310
  • 1
  • 9
  • 26
  • I see. So it complies with an older POSIX version, and named character classes have just been introduced with more recent revisions of the standard. Annoying... but I get your point. Your explanation why it is still present in Debian is also satisfactory. Thanks! Accepting answer. – Guenther Brunthaler Mar 27 '19 at 17:54
  • There may even be an additional benefit from your explanation: I never used the [:blank:] character class in my scripts because I considered it to be "too new", as it was also added by some recent POSIX revision. But considering that named character classes in AWK are a recent addition alltogether, I might use [:blank:] just as well... – Guenther Brunthaler Mar 27 '19 at 17:58
  • Following your referenced links, I just downloaded and built "https://invisible-island.net/datafiles/release/mawk.tar.gz". It turned out that this newer version not only supports named character classes, but even the beforementioned new "[:blank:]"! So my problems really aren't mawks fault at all, it is POSIX compliant even for the most recent standard's revision. Debian just loves to ship dinosaurs. I usually like that. But not so much in this case. ;-) So I will stick to GNU sed for now. – Guenther Brunthaler Mar 27 '19 at 18:20
  • The two bug reports were eventually resolved in 2020... – jhnc Jul 20 '22 at 09:26