6

I have a file:

To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8, 
żeby 
było śmieszniej, haha.
ą
a

Example gawk:

gawk '{printf "%-80s %-s\n", $0, length}' file

In gawk, I get the correct result:

To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,         73
żeby                                                                             5
było śmieszniej, haha.                                                           22
ą                                                                                1
a                                                                                1

In gawk, I get the correct result:


Example mawk:

mawk '{printf "%-80s %-s\n", $0, length}' file
To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,  80
żeby                                                                            6
było śmieszniej, haha.                                                         24
ą                                                                               2
a                                                                                1

In mawk, I get the incorrect result:

As mawk get the same result as gawk?

Zombo
  • 1
  • 62
  • 391
  • 407
Tedee12345
  • 1,182
  • 4
  • 16
  • 26

3 Answers3

17

mawk is a minimal-featured awk designed for speed of execution over functionality. You should not expect it to behave exactly the same as gawk or a POSIX awk. If you're going to use mawk, you need to get a mawk manual describing how IT behaves, don't rely on any other documentation describing how other awks behave.

IMHO there is no correct result for the formatting string %-s as it is meaningless to align a string without specifying a width within which to align it. There's also different interpretations of what length means on it's own - it could be short-hand for length($0) or it could be something else in a non-POSIX awk, there might not even be a length function in some non-POSIX awk and so it might take that as an undefined variable name. How does any given awk handle non-English characters?

As I said - if you're going to use a non-POSIX awk, you need to check the manual for THAT awk for all of the gory details...

Wirawan Purwanto
  • 3,613
  • 3
  • 28
  • 28
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • `mawk` is a POSIX compliant awk. – teppic Oct 05 '15 at 02:42
  • 7
    @teppic - I'm afraid it's not. While [mawk's man page](http://invisible-island.net/mawk/manpage/mawk.html#h3-3_-Regular-expressions) claims that it supports Extended Regular Expressions, mawk fails to implement POSIX character classes, like `[:digit:]`, `[:upper:]`, `[:lower:]`, etc. before version 1.3.4, and a number of linux distros currently ship with version 1.3.3. So .. not POSIX compliant in practice. – ghoti Mar 10 '16 at 03:04
  • Update: in the above I was referring to `mawk1`, there is now a `mawk2` available (?) which apparently shares some functionality and code (?) with `gawk` so you should read the man page for THAT version of `mawk` if interested in it. – Ed Morton Feb 05 '23 at 01:52
0

UPDATE 1 : realized i could massively streamline it -

  • the only thing one needs is to pad back the count of UTF-8 continuation bytes into the total width, and by defining FS as such, then the count will always be NF - 1 for non-empty lines, and the count at the tail end of the line reflects the UTF-8 character count (strictly speaking… it's a code-point count)

    caveat : this code takes the leap of faith and assumes input is valid UTF-8 to begin with, w/o performing data validation checks

=

mawk[1/2]|gawk -b '

$!NF = sprintf("%-*s %s",(__=NF-!_)+80,$_,length($_)-__)' FS='[\\200-\\277]'

=

To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,         73
żeby                                                                             5
było śmieszniej, haha.                                                           22
ą                                                                                1
a                                                                                1
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
-1

I assume you are using different systems... because awk installation on a system uses to be a symlink to either gawk or mawk.

All awk versions are compatible as long as the versions coincide.

I therefore assume that the issue you are facing is due to the use of an older and a newer version of the programs.

runlevel0
  • 2,715
  • 2
  • 24
  • 31
  • 2
    First, different awk implementations are not necessarily compatible, as also mentioned by other answers. Second, it's not even hard to install different awk implementations on the same system and call them using absolute paths. So your answer in neither correct nor particularly helpful. – jena Nov 04 '20 at 15:38