3

I have this (demo) text in the variable ArtTEXT.

{1}: Reporting Problems and Bugs. 
{2}: Other freely available awk implementations. 
{3}: Summary of installation. 
{4}: How to disable certain gawk extensions. 
{5}: Making Additions To gawk. 
{6}: Accessing the Git repository. 
{7}: Adding code to the main body of gawk. 
{8}: Porting gawk to a new operating system.  
{9}: Why derived files are kept in the Git repository. 

It is a one variable where the lines are delimited with an indent.

indent = "\n\t\t\t";

I want to loop through the lines and check something in each line.

So I split it into an array using the indent

split(ArtTEXT,lin, indent);

Then I loop through the array lin

l = 0;
for (l in lin) {
    print "l -- ", l, " lin[l] -- " ,lin[l] ;
}

What I get are the lines of ArtTEXT beginning in line #4

l --  4  lin[l] --  {3}: Summary of installation. 
l --  5  lin[l] --  {4}: How to disable certain gawk extensions. 
l --  6  lin[l] --  {5}: Making Additions To gawk. 
l --  7  lin[l] --  {6}: Accessing the Git repository. 
l --  8  lin[l] --  {7}: Adding code to the main body of gawk. 
l --  9  lin[l] --  {8}: Porting gawk to a new operating system.  
l --  10  lin[l] --  {9}: Why derived files are kept in the Git repository. 
l --  1  lin[l] --   
l --  2  lin[l] --  {1}: Reporting Problems and Bugs. 
l --  3  lin[l] --  {2}: Other freely available awk implementations. 

(The original text has an empty line at the beginning.)

The manual says about the split function:

The first piece is stored in array[1], the second piece in array[2], and so forth.

How do I avoid this problem?

Why is this happening?

Thanks.

Hanan Cohen
  • 383
  • 2
  • 15

1 Answers1

1

In awk, arrays are unordered. If they happen to come out in order, it is accidental.

In GNU awk, it is possible to control the order. For example to get numerical ordering by indices, use PROCINFO["sorted_in"]="@ind_num_asc":

$ awk -v ArtTEXT="$(cat file)" 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"; indent="\n\t\t\t"; split(ArtTEXT, lin, indent); for (l in lin) print "l -- ", l, " lin[l] -- " ,lin[l] ;}'
l --  1  lin[l] --  {1}: Reporting Problems and Bugs. 
l --  2  lin[l] --  {2}: Other freely available awk implementations. 
l --  3  lin[l] --  {3}: Summary of installation. 
l --  4  lin[l] --  {4}: How to disable certain gawk extensions. 
l --  5  lin[l] --  {5}: Making Additions To gawk. 
l --  6  lin[l] --  {6}: Accessing the Git repository. 
l --  7  lin[l] --  {7}: Adding code to the main body of gawk. 
l --  8  lin[l] --  {8}: Porting gawk to a new operating system.  
l --  9  lin[l] --  {9}: Why derived files are kept in the Git repository. 

Alternatively, since the array indices are numerical, we can loop numerically, using for (l=1;l<=length(lin);l++) print...:

$ awk -v ArtTEXT="$(cat file)" 'BEGIN{indent="\n\t\t\t"; split(ArtTEXT, lin, indent); for (l=1;l<=length(lin);l++) print "l -- ", l, " lin[l] -- " ,lin[l] ;}'
l --  1  lin[l] --  {1}: Reporting Problems and Bugs. 
l --  2  lin[l] --  {2}: Other freely available awk implementations. 
l --  3  lin[l] --  {3}: Summary of installation. 
l --  4  lin[l] --  {4}: How to disable certain gawk extensions. 
l --  5  lin[l] --  {5}: Making Additions To gawk. 
l --  6  lin[l] --  {6}: Accessing the Git repository. 
l --  7  lin[l] --  {7}: Adding code to the main body of gawk. 
l --  8  lin[l] --  {8}: Porting gawk to a new operating system.  
l --  9  lin[l] --  {9}: Why derived files are kept in the Git repository. 

Multi-line versions

The GNU code shown over multiple lines looks like:

awk -v ArtTEXT="$(cat file)" '
BEGIN{
    PROCINFO["sorted_in"]="@ind_num_asc"
    indent="\n\t\t\t"
    split(ArtTEXT, lin, indent)
    for (l in lin)
        print "l -- ", l, " lin[l] -- " ,lin[l]
}'

And, the alternate code is:

awk -v ArtTEXT="$(cat file)" '
BEGIN{
    indent="\n\t\t\t"
    split(ArtTEXT, lin, indent)
    for (l=1;l<=length(lin);l++)
        print "l -- ", l, " lin[l] -- " ,lin[l]
}'
John1024
  • 109,961
  • 14
  • 137
  • 171
  • Thank you @john1024. PROCINFO["sorted_in"]="@ind_num_asc" didn't work no matter where I have put it in the code. Looping using for (l=1;l<=length(lin);l++) did the trick. – Hanan Cohen Aug 31 '15 at 07:45
  • What OS are you on? The `PROCINFO["sorted_in"]="@ind_num_asc"` statement only works for GNU awk. That means, for example, that it can't be used on Mac OSX. – John1024 Aug 31 '15 at 08:12
  • 1
    I am using gawk-3.1.6 on Windows 10. – Hanan Cohen Aug 31 '15 at 08:41
  • 1
    @HananCohen Thanks: that explains it. I found the documentation for v3.1.6 [here (zip)](http://gnuwin32.sourceforge.net/downlinks/gawk-doc-zip.php). It turns out that in [version 3.1.6](http://gnuwin32.sourceforge.net/packages/gawk.htm), which dates back to 2008, gawk didn't yet support `PROCINFO["sorted_in"]`. – John1024 Aug 31 '15 at 16:18
  • 1
    A couple of nitpicks - awk arrays are hash tables so it's not correct to say they aren't ordered, they just aren't stored in numerical, or alphabetical, or tme-of-entry or any other order you'd typically want to access them in. The `in` operator by default just walks the hash table pulling out consecutive entries, it's when you use `PROCINFO["sorted_in"]` to specify a particular order you want to access the array that `in` pulls out the entries in that order instead of the default. The order in which the array is stored is completely unaffected by that, it's just the `in` operator that changes – Ed Morton Aug 31 '15 at 17:50
  • 1
    When used as `for (i in array)` the `in` operator walks the array but when used as `if (i in array)` the `in` operator just tests whether or not `i` is a valid index of the array so for a dense array such as populated by `split()`, instead of writing `for (i=1;i<=length(array);i++)` you can just write `for (i=1;i in array;i++)`. Finally - don't use the letter `l` as a variable name as it's so hard to distinguish from the number `1` that it obfuscates your code. – Ed Morton Aug 31 '15 at 17:53