7

I see that all features of AWK are included in GAWK, besides using a system that doesn't have GAWK installed, is there ever a good reason I should use AWK versus GAWK? Does AWK have better performance over GAWK?

b w
  • 91
  • 1
  • 5
  • What do you mean when you say `awk`? – Etan Reisner Apr 22 '15 at 18:58
  • `gawk` is one particular implementation of the `awk` language. The default `awk` command can be any of several different implementations, depending on the system and how it's configured. You'll need to be more specific. On my Linux system (Linux Mint 17), `/usr/bin/awk` is a symlink to `/etc/alternatives/awk`, which in turn is a symlink to `/usr/bin/gawk`. – Keith Thompson Apr 22 '15 at 19:19

3 Answers3

8

awk can refer to many things. There's awk-the-standard, and there's many different implementations, one of which is gawk.

Not using implementation-specific features means that you'll have a high(er) chance that your code will run unchanged on other implementations of awk-the-language.

gawk, being one implementation of awk-the-language, claims to conform to awk-the-standard, while adding some extra features.

$ man awk
…
DESCRIPTION
   Gawk is the GNU Project's implementation of the AWK programming
   language.  It conforms to the definition of the language in the
   POSIX 1003.1 Standard.  This version in turn is  based  on  the
   description in The AWK Programming Language, by Aho, Kernighan,
   and Weinberger.  Gawk provides the additional features found in
   the current version of Brian Kernighan's awk and  a  number  of
   GNU-specific extensions.
…

As for speed, using gawk as "plain" awk should make no difference – often, when gawk is installed, awk will just be a symlink to gawk which means they'll be exactly the same program.

However, using gawk-specific features will mean that you'll be locked in to that specific implementation – so if (hypothetically) you'd find a faster implementation, you'd probably have to adapt your script instead of just swapping out the binary. (There may be implementations that are faster, but I don't know of any as I've never had the need to make my awk scripts run faster.)

Personally, I tend to stick to "plain" awk and not use gawk-specific features, but if you don't care about switching to an other implementation, using gawk extensions might make your script easier to write and save you time on that end.

nobody
  • 4,074
  • 1
  • 23
  • 33
2

Nowadays the most common implementation of AWK is gawk, and possibly the second most common one is mawk, at least because it's the system AWK on debian.

To quote the output of apt-cache show mawk

Mawk is smaller and much faster than gawk. It has some compile-time limits such as NF = 32767 and sprintf buffer = 1020.

On the side of gawk there are a larger number of well thought extensions and, I think, a better management of errors and better error messages, that are a real bonus when you're debugging a complex script and could be a good reason to use gawk, even if you're not interested in its extensions.

On the other hand, if you have a debugged script, if you don't need a particular extension, if you can live with the builtin limits of mawk (that's a lot of ifs) and you want to squeeze the last bit of performance without leaving the comfort of AWK, then mawk is the way to go.

gboffi
  • 22,939
  • 8
  • 54
  • 85
1

Assuming that by "AWK" you mean any awk that is not gawk - No. Always use gawk if at all possible. If it is not on your system, install it.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Shouldn't you use mawk if you need faster execution ? –  Apr 23 '15 at 07:12
  • 1
    Once upon a time when mawk was a minimally featured awk streamlined for performance that was the case, but now mawk supports more gawk extensions than any other awk (except the defunct tawk) so it's gone in a new direction. In any case, usually awk performance issues are related to the design of the script being executed, not the awk implementation executing it, and all awk scripts typically run about as fast as equivalent compiled C programs so IMHO its best to just stick with gawk. – Ed Morton Apr 23 '15 at 12:34
  • I doubt i will ever be able to find it but i remember a question long ago, and i think it was even you who commented with the timings. It was a question about doing some sort of string manipulation. My answer(before i deleted my account) was about 6 times faster in mawk using substr than all the others, so i have just always thought it will be quicker. Also in my experience c code is about twice as fast as any similar awk program. –  Apr 23 '15 at 12:38
  • minimally featured mawk sat dormant for a long time then got picked up and maintained by someone in the past couple of years and theyve been adding features to it so I suspect if you still had the older version you'd see a difference. It might depend what you're doing with it too. idk, essentially I'm saying gawk is fast enough so use it for all the other reasons. The awk code to read records, split into fields, and otherwise manipulate strings is so highly optimized compared to the C code you'd write to do it that the result is about a wash for typical scripts that are just manipulating text. – Ed Morton Apr 23 '15 at 12:46
  • 1
    btw from what I've read recently the same is not true for Perl courtesy of its regexp engine. Apparently processing perl style regexps is quite processor intensive and unfortunately for performance, perl doesn't check the RE first to see if it uses any PCRE extensions and so every regexp is processed by the same regexp engine and so every regexp operation incurs a significant performance hit. – Ed Morton Apr 23 '15 at 12:50
  • 1
    I always knew/thought perl was slower but never knew why, thanks for the info :) –  Apr 23 '15 at 12:56
  • 1
    If you'd ever be interested in writing a C program that does the equivalent of something like `/foo/{print $2, $1}` and checking the time stats on, say, a 1 million line file where half the lines contain `foo` I'd be very interested in the results of comparing C, perl, gawk, and mawk (and others could chime in with python or whatever too). Maybe post it as a canonical question/answer about the relative performance as I couldn't find one right now? Tell you what, I'll post the question and gawk stats and ask for input. – Ed Morton Apr 23 '15 at 13:03