2

I am trying to read a file line by line and find the average of the numbers in each line. I am getting the error: expr: non-numeric argument

I have narrowed the problem down to sum=expr $sum + $i, but I'm not sure why the code doesn't work.

while read -a rows
do
    for i in "${rows[@]}"
    do
        sum=`expr $sum + $i`
        total=`expr $total + 1`
    done
    average=`expr $sum / $total`
done < $fileName

The file looks like this (the numbers are separated by tabs):

1       1       1       1       1
9       3       4       5       5
6       7       8       9       7
3       6       8       9       1
3       4       2       1       4
6       4       4       7       7
NateDawg87
  • 43
  • 1
  • 1
  • 8
  • I think your code is ok, maybe your file contains non-numeric (integer) values. – Reto Aebersold Oct 03 '15 at 01:24
  • No the file is all numbers. Its a grid of all numbers. Is it possible it is looking at the spaces in between the numbers trying to compute those? – NateDawg87 Oct 03 '15 at 01:33
  • @NateDawg87 Can you include in the question a sample line (or two) from the file that causes your code errors? – John1024 Oct 03 '15 at 01:34
  • @John1024 I added what the file looks like – NateDawg87 Oct 03 '15 at 01:38
  • @NateDawg87 Is `$filename` supposed to be the input file or the output file? – John1024 Oct 03 '15 at 01:54
  • @John1024 `$fileName` is the input file. I copied the code wrong, I actually have `done < $fileName` in my code. Just edited it. Still same problem though. – NateDawg87 Oct 03 '15 at 01:56
  • You really need to initialize sum and total at the beginning of each line. Otherwise, they keep adding to the values from the previous line. – rici Oct 03 '15 at 02:06
  • While you can't actually do floating point math in bash, you can fake it for a simple task like this. Check out the lower section of my answer, below. – ghoti Oct 03 '15 at 23:15

4 Answers4

4

With some minor corrections, your code runs well:

while read -a rows
do
    total=0
    sum=0
    for i in "${rows[@]}"
    do
        sum=`expr $sum + $i`
        total=`expr $total + 1`
    done
    average=`expr $sum / $total`
    echo $average
done <filename

With the sample input file, the output produced is:

1
5
7
5
2
5

Note that the answers are what they are because expr only does integer arithmetic.

Using sed to preprocess for expr

The above code could be rewritten as:

$ while read row; do expr '(' $(sed 's/  */ + /g' <<<"$row") ')' / $(wc -w<<<$row); done < filename
1
5
7
5
2
5

Using bash's builtin arithmetic capability

expr is archaic. In modern bash:

while read -a rows
do
    total=0
    sum=0
    for i in "${rows[@]}"
    do
        ((sum += $i))
        ((total++))
    done
    echo $((sum/total))
done <filename

Using awk for floating point math

Because awk does floating point math, it can provide more accurate results:

$ awk '{s=0; for (i=1;i<=NF;i++)s+=$i; print s/NF;}' filename
1
5.2
7.4
5.4
2.8
5.6
John1024
  • 109,961
  • 14
  • 137
  • 171
1

Some variations on the same trick of using the IFS variable.

#!/bin/bash

while read line; do
    set -- $line
    echo $(( ( $(IFS=+; echo "$*") ) / $# ))
done < rows

echo

while read -a line; do
    echo $(( ( $(IFS=+; echo "${line[*]}") ) / ${#line[*]} ))
done < rows

echo

saved_ifs="$IFS"
while read -a line; do
    IFS=+
    echo $(( ( ${line[*]} ) / ${#line[*]} ))
    IFS="$saved_ifs"
done < rows
Harvey
  • 5,703
  • 1
  • 32
  • 41
1

Others have already pointed out that expr is integer-only, and recommended writing your script in awk instead of shell.

Your system may have a number of tools on it that support arbitrary-precision math, or floats. Two common calculators in shell are bc which follows standard "order of operations", and dc which uses "reverse polish notation".

Either one of these can easily be fed your data such that per-line averages can be produced. For example, using bc:

#!/bin/sh

while read line; do
  set - ${line}
  c=$#
  string=""
  for n in $*; do
    string+="${string:++}$1"
    shift
  done
  average=$(printf 'scale=4\n(%s) / %d\n' $string $c | bc)
  printf "%s // avg=%s\n" "$line" "$average"
done

Of course, the only bc-specific part of this is the format for the notation and the bc itself in the third last line. The same basic thing using dc might look like like this:

#!/bin/sh

while read line; do
  set - ${line}
  c=$#
  string="0"
  for n in $*; do
    string+=" $1 + "
    shift
  done
  average=$(dc -e "4k $string $c / p")
  printf "%s // %s\n" "$line" "$average"
done

Note that my shell supports appending to strings with +=. If yours does not, you can adjust this as you see fit.

In both of these examples, we're printing our output to four decimal places -- with scale=4 in bc, or 4k in dc. We are processing standard input, so if you named these scripts "calc", you might run them with command lines like:

$ ./calc < inputfile.txt

The set command at the beginning of the loop turns the $line variable into positional parameters, like $1, $2, etc. We then process each positional parameter in the for loop, appending everything to a string which will later get fed to the calculator.


Also, you can fake it.

That is, while bash doesn't support floating point numbers, it DOES support multiplication and string manipulation. The following uses NO external tools, yet appears to present decimal averages of your input.

#!/bin/bash

declare -i total

while read line; do

  set - ${line}
  c=$#
  total=0
  for n in $*; do
    total+="$1"
    shift
  done

  # Move the decimal point over prior to our division...
  average=$(($total * 1000 / $c))
  # Re-insert the decimal point via string manipulation
  average="${average:0:$((${#average} - 3))}.${average:$((${#average} - 3))}"
  printf "%s // %0.3f\n" "$line" "$average"

done

The important bits here are: * declare which tells bash to add to $total with += rather than appending it as if it were a string, * the two average= assignments, the first of which multiplies $total by 1000, and the second of which splits the result at the thousands column, and * printf whose format enforces three decimal places of precision in its output.

Of course, input still needs to be integers.

YMMV. I'm not saying this is how you should solve this, just that it's an option. :)

ghoti
  • 45,319
  • 8
  • 65
  • 104
0

This is a pretty old post, but came up at the top my Google search, so thought I'd share what I came up with:

while read line; do
    # Convert each line to an array
    ARR=( $line )

    # Append each value in the array with a '+' and calculate the sum
    #   (this causes the last value to have a trailing '+', so it is added to '0')
    ARR_SUM=$( echo "${ARR[@]/%/+} 0" | bc -l)

    # Divide the sum by the total number of elements in the array
    echo "$(( ${ARR_SUM} / ${#ARR[@]} ))"
done < "$filename"
Sam
  • 387
  • 1
  • 6
  • 15