3

I have a ton of files in subfolders, each containing three columns of numbers. I need to locate the largest number in $2 and then print columns $1 and $2.

This is what I got:

awk 'FNR > 1 {max=dist=0; if($2>max){dist=$1; max=$2}}END{print FILENAME "   distance: " dist "   max: " max}' ./nVT_*K/rdf_rdf_aam_aam_COM.dat

This works, however only prints values for the last input file. I need one from each.

Iterating using a bash for loop produced a "command not found" for the awk part. I am currently piping the echoed for loop output to a file and running as a script, though this is not a feasible plan in the long run.

Can anyone help toss this around so that it can take a bunch of input files in different subfolders and printing the intended result from each file as such:

./nVT_277K/rdf_rdf_aam_aam_COM.dat   distance: 4.650000   max: 1.949975
./nVT_283K/rdf_rdf_aam_aam_COM.dat   distance: 4.650000   max: 1.943047
./nVT_289K/rdf_rdf_aam_aam_COM.dat   distance: 4.650000   max: 1.907280
...
...
...

I'd be extremely grateful for any input here. Thanx

gusgrave
  • 45
  • 5

2 Answers2

1

With GNU awk for ENDFILE:

awk '
    FNR > 1 { if ((max=="") || ($2>max)) {dist=$1; max=$2} }
    ENDFILE { print FILENAME "   distance: " dist "   max: " max; max=dist="" }
' ./nVT_*K/rdf_rdf_aam_aam_COM.dat

With any awk and assuming your inputs files are not empty:

awk '
    FNR==1 { if (NR>1) print fname "   distance: " dist "   max: " max; max=dist=""; fname=FILENAME; next }
    (max=="") || ($2>max) {dist=$1; max=$2} }
    END { print fname "   distance: " dist "   max: " max }
' ./nVT_*K/rdf_rdf_aam_aam_COM.dat
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    Thank you Ed, this was exactly what I needed and it works equally well using both the "Mac" and GNU awk versions. I'll dissect the code and figure out where I went wrong. Might have been the 34°C temperature in the office yesterday that made logical thinking a bit hard. – gusgrave Aug 09 '18 at 08:47
  • That first script actually won't work on a Mac unless that Mac is running GNU awk. It won't work on OSX/BSD awk because `ENDFILE` is a GNU awk extension, it's not part of the POSIX spec. I added a version that'll work in any awk. – Ed Morton Aug 09 '18 at 12:46
  • Well, I did not expect it to run either (having experienced issues with END/ENDFILE before) though to my surprise, running the command with "awk" or "gawk" produce the same results without reported errors. – gusgrave Aug 16 '18 at 11:00
  • 1
    You won't get an error report since `ENDFILE` to any non-gawk is simply an uninitialized variable and so has the value zero-or-null which evaluates to a false condition in the context in my first script and so it will not produce any output. You could replace ENDFILE with AARDVARK and get the same result with a non-GNU awk. – Ed Morton Aug 16 '18 at 12:17
  • 1
    You know what, I can't tell you why or how though it does work and I am one happy awk n00b. You have yourself a fantastic weekend and another huge thank you for saving me several hours of work – gusgrave Aug 17 '18 at 13:17
0

assuming there is at least one positive value (so that we don't need to initialize)

$ awk 'FNR==1    {f=FILENAME}
       $2>max[f] {max[f]=$2; dist[f]=$1} 
       END       {for(f in max) print f, "distance:", dist[f], "max:", max[f]}' files

max and distance are indexed by filenames, since has to be unique within given path...

karakfa
  • 66,216
  • 7
  • 41
  • 56
  • Thank you! Closer though not right, this does iterates all input files and the output looks as intended, however the function seems to evaluate the wrong value, seemingly the "dist" value ($1) instead of "max" $2. This is the output: `./nVT_331K/rdf_rdf_aam_aam_COM.dat distance: 14.950000 max: 0.983862 ./nVT_325K/rdf_rdf_aam_aam_COM.dat distance: 14.950000 max: 0.983969 ./nVT_319K/rdf_rdf_aam_aam_COM.dat distance: 14.950000 max: 0.982654` 14.950000 is the last and largest number of $1 (0.000000-14.950000), i need $1 when $2 is "max" – gusgrave Aug 09 '18 at 08:42