1

I'm fairly new to awk and I'm writing a script to read contents of a file process it and then append the result to few files based on the result. The script works on file containing about 100 lines but fails for a file containing 125k lines. I'm confused if its the issue with the way i'm doing things here because i've seen awk work fine with larger files.

Here's my code: FileSplitting.awk

BEGIN { print "Splitting file ";} { print NR; r=int($2/1024); if(r>5){ print $0 >> "testFile";} if(r<=5){ print $0 >> "testFile2";} } END { print "Done"; }

I'm invoking the script like this:

awk -F"," -f FileSplitting.awk test.csv
Aryan
  • 61
  • 2
  • 7

1 Answers1

2

The issue is you're using the wrong output redirection operator. You should be using > not >>. Awk does not behave the same as shell wrt these 2 operators. man awk for how those operators work in awk and change your script to:

BEGIN { print "Splitting file ";} { print NR; r=int($2/1024); if(r>5){ print $0 > "testFile";} if(r<=5){ print $0 > "testFile2";} } END { print "Done"; }

to get it to work, and then clean it up to:

BEGIN { print "Splitting file " }
{ print NR; print > ("testFile" (int($2/1024)>5?"":"2")) }
END { print "Done" }

You do NOT need to close the files after every write.

In response to @Aryan's comment below, here are the > and >> awk vs shell equivalents:

1) awks >

awk:
    { print > "foo" }

shell equivalent:

    > foo
    while IFS= read -r var
    do
        printf "%s\n" "$var" >> foo
    done

2) awks >>

awk:
    { print >> "foo" }

shell equivalent:

    while IFS= read -r var
    do
        printf "%s\n" "$var" >> foo
    done
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Oops! assumed that `>` and `>>` work the same way, thanks for letting me know. Can you point me to any reference where I could learn about their differences. – Aryan Aug 11 '13 at 16:22
  • `"man awk for how those operators work in awk"` note the ternary operator is `GNU awk` only, +1. – Chris Seymour Aug 11 '13 at 16:32
  • The ternary operator has been part of the awk language since the late 1980s. It's not gawk-only, it should be supported in all modern awks. – Ed Morton Aug 11 '13 at 17:54
  • @Aryan I posted examples of the 2 awk operations and their' shell equivalents. – Ed Morton Aug 11 '13 at 18:00
  • Tell that to Apple, try `awk 'BEGIN{print 1==1?1:0}'` with the default implementation of `awk` on OSX. – Chris Seymour Aug 11 '13 at 18:35
  • I don't have it so I can't but IMHO if it doesn't support ternary operators then it's a broken awk. It's certainly non-POSIX-compliant at best and so should be replaced by anyone planning to use awk on that platform. – Ed Morton Aug 11 '13 at 19:10
  • @sudo_O I'd never use a ternary operator without some kind of parens (I'm actually surprised it works without parens but apparently it does in gawk at least) so I'm curious - does the Apple awk you mentioned work with `awk 'BEGIN{print (1==1?1:0)}'` or `awk 'BEGIN{print (1==1)?1:0}'`. – Ed Morton Aug 12 '13 at 14:41
  • 1
    Yes `awk 'BEGIN{print (1==1?1:0)}'` works on Mac, the latter does not. Good to know! Nice one @EdMorton – Chris Seymour Aug 12 '13 at 14:50
  • @sudo_O - ah, that makes sense. I guess `print a ? b : c` must be syntactically ambiguous - it certainly is in my head at least so I always use `print (a ? b : c)` :-). Thanks for checking it out. – Ed Morton Aug 12 '13 at 15:51