2

I am trying to evaluate two files consecutively with awk. At the end of the first file I am reading a date and I use that date as input for the evaluation of the second file. Unfortunately I have some problems understanding how to detect the end of the first file read the date and continue evaluating the next file. I have found some answers such as FNR==NR, unfortunately, I am not able to implement them correctly. I tried a poor man’s solution by hardcoding the number of lines. However, this is not a terribly smart thing to do. I still have problems processing the second file though:

    BEGIN initalize the counters 



    {
    if(NR==FNR) <<<<<< this is needed to run properly, only NR==FNR fails, why ?!       
    {     
          # file_1      
          do -> from the last line of the first file extract a date 

          next << what is the meaning of this ??
    }                        

    {
          # file_2
          do -> read every line of the second file 
             and sum up the values form one of the colums


    }


    }


    END { divide the sum accumulated form file=2 
          by the time calculated form the last line of file=1}

# for calling the script use :
awk -f SCRIPT file_1 file_2

#example files
# file1 last line
version 1.5 code 11 mpi start /01/12/2014/ 18:33:12 end /01/12/2014/ 20:05:12

#file2

     1.28371E-05    0.2060    0.2060   -8   -8    0    0    0
     1.91616E-05    0.1927    0.1927   -7   -8    0    0    0
     1.27306E-05    0.1567    0.1567   -6   -8    0    0    0
     2.11623E-05    0.1523    0.1523   -5   -8    0    0    0
     1.67914E-05    0.1721    0.1721   -4   -8    0    0    0
     1.47247E-05    0.1851    0.1851   -3   -8    0    0    0
     1.32049E-05    0.1919    0.1919   -2   -8    0    0    0
     1.81256E-05    0.2130    0.2130   -1   -8    0    0    0
     2.63500E-05    0.1745    0.1745    0   -8    0    0    0
     1.99232E-05    0.1592    0.1592    1   -8    0    0    0
     2.08924E-05    0.1537    0.1537    2   -8    0    0    0
     2.44922E-05    0.1459    0.1459    3   -8    0    0    0
     2.53759E-05    0.1902    0.1902    4   -8    0    0    0
     2.30230E-05    0.1708    0.1708    5   -8    0    0    0
     2.10723E-05    0.1636    0.1636    6   -8    0    0    0
     1.86613E-05    0.1915    0.1915    7   -8    0    0    0
     2.05359E-05    0.1649    0.1649    8   -8    0    0    0
     1.09533E-05    0.1765    0.1765   -8   -7    0    0    0
     1.56917E-05    0.1740    0.1740   -7   -7    0    0    0
     1.52199E-05    0.2145    0.2145   -6   -7    0    0    0
     .....   

I would appreciate any help, Thank you in advance

Alex

Alexander Cska
  • 738
  • 1
  • 7
  • 29
  • 1
    It sounds like what you want is absolutely trivial in awk but clarify what you mean by `At the end of the first file I am reading a date` as there's several possibilities, e.g. you're reading it from a file (in which case why not do it before the script runs) or getting it from a variable (ditto) or prompting someone to enter it or something else and the right solution for you depends on what it is you're doing at that step. – Ed Morton Jan 22 '14 at 14:52
  • I would like to apologize for the inconvenience. I am reading one file, say file A. This file contains date and time at its end. I read this time and proceed further to the second file, where I use the time as an input for some expression. So to say form the first file I am extracting a variable, the value of which is used for the processing of the second file. – Alexander Cska Jan 22 '14 at 15:25
  • I posted in answer, see if that's all you need. If not, post a script that demonstrates your problem along with some sample input ad expected output. The script you posted seems to have a lot of complexity completely unrelated to the problem you're describing so it'd help us to help you if we didn't have to read through all of that just to see the actual issue. – Ed Morton Jan 22 '14 at 15:54
  • How are you passing the files to awk? If you did `awk -f script.awk file1 file2` or even `cat file1 | awk -f script.awk - file2, `FNR` would be 1 for the first record of each file. In the case where `FNR!=NR&&FNR==1`, you have changed files, and the last record read would be the date you seek. However, if you did `cat file1 file2 | awk -f script.awk`, FNR!=NR` would never be true because standard input is a single file, and that is from whence awk would read. –  Jan 22 '14 at 16:01
  • I have updated and simplified my script and using pseudo code tried to explain my problem. I hope this will help. – Alexander Cska Jan 23 '14 at 11:01
  • +1 Nice explanation.. I guess now the posters below can update their answers and help you:) Good luck! – Håkon Hægland Jan 23 '14 at 11:10
  • @AlexanderCska please post some sample input and expected output. It SOUNDS like the answers posted already answer your question so some in/out would help us understand what it is we/you are missing. – Ed Morton Jan 23 '14 at 13:48
  • I have updated my original message including the last line of file1 where the date is, together with "head -20 file2" – Alexander Cska Jan 23 '14 at 15:19
  • 1
    ...and STILL no expected output. Sigh.... – Ed Morton Jan 23 '14 at 20:27

4 Answers4

1

You can do this a couple of ways:

  • Buffer each line and check when FNR==1

Something like:

awk 'FNR==1 && NR!=1{print line,"is last in first file"}NR>1{print line}{line=$0} '
  • If you are using gawk you can use the ENDFILE block.

Or:

gawk '{print $0} ENDFILE && !f {print $0,"is last line in first file", f=1}'
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • Hi, thanks for the help. Sadly enough it didn't work as expected, the code ended up printing the entire files on the screen. I also tried modifying my script in accordance with your suggestion but to no avail. – Alexander Cska Jan 22 '14 at 12:16
1

I set variables on the command line to accomplish this:

awk 'F==1 {print "one: ", $0} F==2 {print "two: ", $0}' F=1 one.txt F=2 two.txt

Whenever something of the form x=y is encountered, it sets the variable x in awk to y.

Jan
  • 1,807
  • 13
  • 26
  • Hi I have updated my script. Actually what you proposed is somehow in the right direction. The expressions F==1 F==2 ensure that I am reading the proper file. But how do I detect the end of the first file. I can use regex ( F==1 && /regex/), however, I presume much more elegant solution exists. – Alexander Cska Jan 23 '14 at 11:09
  • In Gnu Awk there is an `ENDFILE{}` rule.. So you could try `ENDFILE { if (FNR==NR) date=$0 }` – Håkon Hægland Jan 23 '14 at 11:23
  • @AlexanderCska Do you really need to know the last line? Can't you just keep a variable `lastLineFile1 = $0` in the block for the first file, and in the `END` clause extract the date/time you need from `lastLineFile1`? – Jan Jan 23 '14 at 12:47
1

If you just want the date from the last line of the first file and the contents of the second file for processing by awk, you can do this and make life easier:

(tail -1 firstfile; cat secondfile ) | awk 'something' -

Of course, if the date is not exactly the last line, you could do something like this:

(grep ^Date firstfile; cat secondfile ) | awk 'something' -

This way you will only have a single "file/stream" to deal with in awk and the first line will be your date.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
1

It sounds like all you need is something like:

awk '
NR==FNR {
   do file1 stuff
   date = $0
   next
}
{
   do file2 stuff using the variable "date" which is set to the last line of file1
}
' file1 file2

If that's not all you need, post some sample input and expected output to help clarify what you're trying to do.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • I tried simplifying my sample code, so that my problem becomes more understandable. Actually, if I understand your idea correctly, NR==FNR ensures, that I am reading the first file only. Because for the first file the local counter FNR and the global counter NR are equal. For the second file they are shifted by the number of lines of the first file. But how do I detect exactly the end of the first file? – Alexander Cska Jan 23 '14 at 13:01
  • In gawk you can use `ENDFILE` but so far I've seen nothing to indicate you need that. In the sample I posted, while reading file2 and in the END section, the variable `date` will be populated with the value of the last line of the first file. So, why is that not all you need? – Ed Morton Jan 23 '14 at 13:42
  • Hi Ed, I think your idea was ok, the problems disappeared when I changed NR==FNR to an if statement if(NR==FNR) I don't know why. Moreover, what is "next" exactly doing? – Alexander Cska Jan 23 '14 at 20:21
  • You do not need to change NR==FNR to an if statement, you instead need to get rid of the opening `{` and closing `}` you mistakenly added to your script. Also - do not put a newline between `NR==FNR` and `{` - white space matters to awk. `next` tells awk to read the next record instead of continuing processing the current record. – Ed Morton Jan 23 '14 at 20:23