fatal: division by zero attempted when trying to find mean?

Question

I'm trying to find the mean of several numbers in a file, which contains "< Overall >" on the line.

My code:

awk -v file=$file '{if ($1~"<Overall>") {rating+=$1; count++;}} {rating=rating/count; print file, rating;}}' $file | sed 's/<Overall>//'

I'm getting

awk: cmd. line:1: (FILENAME=[file] FNR=1) fatal: division by zero attempted

for every file. I can't see why count would be zero if the file does contain a line such as "< Overall >5"

EDIT: Sample from the (very large) input file, as requested:

<Author>RW53
<Content>Location! Location?       view from room of nearby freeway 
<Date>Dec 26, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>3
<Value>4
<Rooms>3
<Location>2
<Cleanliness>4
<Check in / front desk>3
<Service>-1
<Business service>-1

Expected output:

[filename] X

Where X is the average of all the lines containing < Overall >

input $file is the location of the folder containing the input file — daltojam, Mar 03 '17 at 13:25
You are dividing by `count` on every line, even those before you've set `count` for the first time. You appear to just be missing the pattern `END` prior to that block. `awk '... END { rating=rating/count; ... }` (although you still need to ensure that at least one rating was found). — chepner, Mar 03 '17 at 13:27
I'm getting a syntax error when I put the 'END {rating = rating ...}' in the line — daltojam, Mar 03 '17 at 13:30

Inian · Accepted Answer · 2017-03-03T13:45:53.173

Use an Awk as below,

awk -F'<Overall>' 'NF==2 {sum+=$2; count++}
                   END{printf "[%s] %s\n",FILENAME,(count?sum/count:0)}' file

For an input file containing two <Overall> clauses like this, it produces a result as follows the file-name being input-file

<Author>RW53
<Content>Location! Location?       view from room of nearby freeway
<Date>Dec 26, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>3
<Value>4
<Rooms>3
<Location>2
<Cleanliness>4
<Check in / front desk>3
<Service>-1
<Business service>-1
<Overall>2

Running it produces,

[input-file] 2.5

The part, -F'<Overall>' splits input-lines with de-limiter as <Overall>, basically only the lines having <Overall> and the number after it will be filtered, the number being $2 which is summed up and stored in sum variable and count is tracked in c.

The END clause gets executed after all lines are printed which basically prints the filename using the awk special variable FILENAME which retains the name of the file processed and the average is calculated iff the count is not zero.

Can you explain how this works? I'm quite new to bash and awk. Thanks! — daltojam, Mar 03 '17 at 13:32
@daltojam: See if the explanation helped and solve your problem! — Inian, Mar 03 '17 at 13:50

chepner · Answer 2 · 2017-03-03T15:45:03.873

1

You aren't waiting until you've completely read the file to compute the average rating. This is simpler if you use patterns rather than an if statement. You also need to remove <Overall> before you attempt to increment rating.

awk '$1 ~ /<Overall>/ {rating+=sub("<Overall>", "", $1); count++;}
     END {rating=rating/(count?count:1); print FILENAME, rating;}' "$file"

(Answer has been updated to fix a typo in the call to sub and to correctly avoid dividing by 0.)

edited Mar 03 '17 at 15:45

answered Mar 03 '17 at 13:33

chepner

497,756
71
530
681

I've got some unexpected output with this answer. The ratings in all the files are out of 5 but all of the output is greater than this. I think the logical OR is choosing 1 every time even when count!=0. However when I remove the "||1" every output is 1. Any ideas? – daltojam Mar 03 '17 at 13:49
@daltojam the first argument to sub() should be "", not " – linuxfan says Reinstate Monica Mar 03 '17 at 13:58

score 0 · Answer 3 · answered Mar 03 '17 at 13:57

awk -F '>' '
   # separator of field if the >
   # for line that containt <Overall>
   /<Overall>/ {
       # evaluate the sum and increment counter
       Rate+=$2;Count++}
   # at end of the current file
   END{
      # print the average.
      printf( "[%s] %f\n", FILENAME, Rate / ( Count + ( ! Count  ) )
      }
   ' ${File}

# one liner
awk -F '>' '/<Overall>/{r+=$2;c++}END{printf("[%s] %f\n",FILENAME,r/(c+(!c))}' ${File}

Note:

( c + ( ! c ) ) use a side effect of logical NOT (!). It value 1 if c = 0, 0 otherwise. So if c = 0 it add 1, if not it add 0 to itself insurring a division value of at least 1.
assume the full file reflect the sample for content

fatal: division by zero attempted when trying to find mean?

3 Answers3