1

Awk has the built in variables FNR and NR for the number of records (lines usually) read from the current file and in total.

In awk, it is common to have:

$ awk 'FNR==NR {first file lines; next } {process remaining lines } f1 f2

Commonly, f1 contains values that determine how to process the remaining files. (key words, line numbers, etc.)

Ruby has the makings of a fabulous text processing language. Ruby has $. as the equivalent of awk's NR What is equivalent of FNR?

dawg
  • 98,345
  • 23
  • 131
  • 206

1 Answers1

2

Given:

$ head f?.txt
==> f1.txt <==
line 1
line 2

==> f2.txt <==
line 3
line 4

Ruby has the ARGF stream (aliased as $< if you want to feel Perly) that either reads STDIN or opens a file from the command line. Same behavior as awk:

$ awk '{
    printf("FILENAME: %s, FNR: %s, NR: %s, %s\n", FILENAME, FNR,NR,$0)}
' f?.txt
FILENAME: f1.txt, FNR: 1, NR: 1, line 1
FILENAME: f1.txt, FNR: 2, NR: 2, line 2
FILENAME: f2.txt, FNR: 1, NR: 3, line 3
FILENAME: f2.txt, FNR: 2, NR: 4, line 4

$ ruby -lne '
    printf("FILENAME: %s, FNR: %s, NR: %s, %s\n", $<.file.path, $<.file.lineno, $., $_)
' f?.txt
FILENAME: f1.txt, FNR: 1, NR: 1, line 1
FILENAME: f1.txt, FNR: 2, NR: 2, line 2
FILENAME: f2.txt, FNR: 1, NR: 3, line 3
FILENAME: f2.txt, FNR: 2, NR: 4, line 4

If you want to read both STDIN and a file, you would use a - for a file placeholder:

$ echo '123' | awk '1' - <(echo 456)
123
456
$ echo '123' | awk '1' <(echo 456) -
456
123

$ echo '123' | ruby -lne 'puts $_' - <(echo 456)
123
456
$ echo '123' | ruby -lne 'puts $_' <(echo 456) -
456
123

Some more corresponding variables:

╔══════════╦═══════════════════╦═════════════════════════════════════════╗
║   awk    ║       ruby        ║                 comment                 ║
╠══════════╬═══════════════════╬═════════════════════════════════════════╣
║ $0       ║ $_                ║ unsplit record (line usually)           ║
║ NF       ║ $F.length         ║ Number of fields from autosplit         ║
║ FNR      ║ ARGF.file.lineno  ║ Number records read from current source ║
║ NR       ║ ARGF.lineno or $. ║ Total number of records so far          ║
║ (magic)  ║ ARGF or $<        ║ stream from either STDIN or a file      ║
║ $1..$NF  ║ $F[0]..$F[-1]     ║ First to last field from autosplit      ║
║ FS       ║ $;                ║ Input field separator                   ║
║ RS       ║ $/                ║ Input record separator                  ║
║ FILENAME ║ $<.file.path      ║ Filename of file being processed        ║
╚══════════╩═══════════════════╩═════════════════════════════════════════╝      

So if you has a list of line numbers in f1 and a text file that you wanted to index with those line numbers (something you would use awk or sed to do) it is possible to use Ruby.

Given:

$ echo "1
2
44
2017" >f1
$ seq 10000 | awk '{print "Line", $1}' >f2

In awk you would do:

$ awk 'FNR==NR{ln[$1]; next} 
       FNR in ln'    f1 f2

In Ruby you could do:

$ ruby -lane 'BEGIN{h=Hash.new}
              if $<.file.lineno == $<.lineno
                 h[$F[0].to_i]=true
                 next
              end
              puts $_ if h[$<.file.lineno]' f1 f2

Both print:

Line 1
Line 2
Line 44
Line 2017

The awk version of this example is roughly 5x faster (go awk) but the Ruby version would easily support inputs that awk couldn't, such as JSON, XML, complex csv, etc

dawg
  • 98,345
  • 23
  • 131
  • 206
  • 1
    FYI GNU awk has an XML parser and a CSV parser is under development. No JSON in the forecast AFAIK. See https://www.gnu.org/software/gawk/manual/html_node/gawkextlib.html for other formats gawk has a parser for. – Ed Morton Jul 07 '17 at 00:52