0

I have the below code which is supposed to do 2 things on tab delimited files.

  1. Calculate the sum of one field for the whole file
  2. Return the number of records in the file

I am facing 2 problems:

  1. The calculated total seems fine for some files. But in other files it seems to stop in the middle of the file at some record and doesn't continue forward till the end of file.Is there any special character which AWK is being confused for it to be end of file.
  2. I get a zero instead of the total number of records for any file

Could someone pls guide me as to what i am doing wrong. Running this through a .bat file in Windows 7

BEGIN { FS="\t" }
  { sum[FILENAME] += $42 }
END 
{tr=NR}

{
for (i=1;i<ARGC;i++)
    printf "%s %15d %d\n",ARGV[i],sum[ARGV[i]],tr
}

Thanks

Ross

  • When you say `it seems to stop` - do you mean the process hangs or that the process ends without printing anything or that the process ends but the printf shows some unexpected value or something else? What is the purpose of the `tr` variable as opposed to just printing `NRs`? When you invoke your script are you ONLY specifying file names in the arg list or are you setting variables there too? show us how you're running the command and what it outputs. – Ed Morton Apr 03 '14 at 12:32
  • Also - you say you want a script to `Calculate the sum of one field for the whole file` and `Return the number of records in the file` but the script you posted seems to be intended to calculate the sum and print the total number of records across many files. What DO you want it to do? – Ed Morton Apr 03 '14 at 12:39
  • Hi Ed.feedback on both ur comments. the process doesn't hang. It returns a total for the field, based on the number of records it has traversed. But this total is not correct as it has not gone thru all the records in the file. I am runnign this script on multiple tab delimited txt files. I am invoking it thru the following command in a .bat file. awk -f SumColumnRecordCount.awk *.txt i am expecting the output to return filename,sum of amount,no of records for every txt file it has run on.thks – user2473726 Apr 03 '14 at 13:06
  • @EdMorton I got rid of the `{tr=NR}`as per your suggestion and directly printed NR. But as per your accurate observation, this returns the total number of records of all the files. I tried `FNR` , but that returns the number of records for the first file. Like the amount summing how do i get it to return the number of records per file. Thanks – user2473726 Apr 13 '14 at 13:09
  • FNR holds the number of records read so far in the current file. With GNU awk you can populate an array in ENDFILE with the values of FNR. With other awks you need to set a variable {fnr=FNR} in the main part of your code and then save that in an FNR==1 section. Rather than try to solve this in comments, you should really post a new question with sample input and expected output and tell us there what awk version you are using. – Ed Morton Apr 13 '14 at 14:41

1 Answers1

0

I assume your END block should look like this instead:

END {
    tr=NR
    for (i=1;i<ARGC;i++)
        printf "%s %15d %d\n",ARGV[i],sum[ARGV[i]],tr
}
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Thanks Glen. That fixed my point 2. But point 1 still remains. It doesn't carry on till end of file. I cannot locate any special characters that might be making it stop in between. Even the number of records shows the count only till where it stops. Any ideas? – user2473726 Apr 03 '14 at 10:43
  • Unless you're on Windows where the Ctrl-Z char might affect you, there's no magic character that stops awk. Can you come up with a reproducible example that you can post here? http://stackoverflow.com/help/mcve – glenn jackman Apr 03 '14 at 10:53
  • I seem to have identified the issue. One of the fields contains a character like this. cannot paste it as it puts it as a blank. Its a tiny symbol representing a right arrow. if i remove that it runs till end of file. Any method to handle this? Thanks – user2473726 Apr 03 '14 at 13:01
  • Here is the image of it. Its not visible in notepad. I could see it in Excel.[IMG]http://i58.tinypic.com/161nb00.jpg[/IMG] – user2473726 Apr 03 '14 at 13:19
  • Apparently this character trips even Perl in thinking its EOF as per this post. Any idea folks how to resolve this. http://stackoverflow.com/questions/21950579 – user2473726 Apr 03 '14 at 13:42
  • Looks like it is a magic character after all, the control-Z that Glenn mentioned earlier, see http://stackoverflow.com/questions/11547443/right-arrow-symbol-causing-abrupt-end-of-fread. Running `dos2unix -ascii` on it should remove the character. – Ed Morton Apr 03 '14 at 14:48
  • @EdMorton tried the dos2unix but it results in the following error messg dos2unix: Binary symbol 0x1A found at line 3 dos2unix: Skipping binary file abc.txt Further research on the net shows that the tr command is recommended but i am not sure how to install this. Thanks – user2473726 Apr 03 '14 at 19:29
  • Can you install cygwin (http://cygwin.org/)? It provides a UNIX-like shell that runs in a window on top of Windows and you can run `tr` and any other UNIX tool from that. – Ed Morton Apr 03 '14 at 21:28
  • 1
    @EdMorton Thanks.Installed cygwin and ran TR. Works fine. – user2473726 Apr 09 '14 at 07:52