1

Here's a sample of my matrix:

A   B   C   D   E
1   0   0   1   1
0   0   0   0   0

0   0   1   1   0
0   2           1

You can think of each row as a respondent and each column as an item on a questionnaire.

My goal is to take an average of the sum of each row (i.e. total score for each respondent) without creating a new column AND accounting for the fact that some or all of the entries in a given row are empty (e.g., some respondents
missed some items [see row 5] or didn't complete the questionnaire entirely [see row 3]).

The desired solution for this matrix = 1.67, whereby

[1+0+0+1+1 = 3] + [0+0+0+0+0 = 0] + [0+0+1+1+0 = 2]/3 == 5/3 = 1.67

As you can see, we have averaged over three values despite there being five rows because one has missing data.

I am already able to take an average of the sum of rows which are only summed for non-missing entries, e.g.,:

=AVERAGE(IF(AND(A1<>"",B1<>"",C1<>"",D1<>"",E1<>""),SUM(A1:E1)),IF(AND(A2<>"",B2<>"",C2<>"",D2<>"",E2<>""),SUM(A2:E2)),IF(AND(A3<>"",B3<>"",C3<>"",D3<>"",E3<>""),SUM(A3:E3)),IF(AND(A4<>"",B4<>"",C4<>"",D4<>"",E4<>""),SUM(A4:E4)),IF(AND(A5<>"",B5<>"",C5<>"",D5<>"",E5<>""),SUM(A5:E5)))

However, this results in a value of 1 because it treats any row with some or all values values as = 0.

It does the following:

[1+0+0+1+1 = 3] + [0+0+0+0+0 = 0] + [0+0+0+0+0 = 0] + [0+0+1+1+0 = 2] + [0+0+0+0+0 = 0]/4 == 5/5 = 1

Does anyone have any ideas about how to adapt the current code to average over non-missing values or an alternative way of achieving the desired result?

PyjamaNinja
  • 293
  • 2
  • 8
  • 1
    So if you have (say) one missing value in a row, you want to take the average of the other four values, and if all the values in a row are missing, you want to ignore that row completely? – Tom Sharpe Jul 24 '18 at 10:05
  • You are correct and I have updated the answer to make this more explicit. Cheers, PJnin – PyjamaNinja Jul 24 '18 at 10:46

2 Answers2

2

You can do this more concisely with an array formula, but the short answer to fix up your existing formula is, if you have a blank cell in your sheet somewhere (say it's F1) AVERAGE will ignore blank cells so change your formula to

=AVERAGE(IF(AND(A1<>"",B1<>"",C1<>"",D1<>"",E1<>""),SUM(A1:E1),F1),IF(AND(A2<>"",B2<>"",C2<>"",D2<>"",E2<>""),SUM(A2:E2),F1),IF(AND(A3<>"",B3<>"",C3<>"",D3<>"",E3<>""),SUM(A3:E3),F1),IF(AND(A4<>"",B4<>"",C4<>"",D4<>"",E4<>""),SUM(A4:E4),F1),IF(AND(A5<>"",B5<>"",C5<>"",D5<>"",E5<>""),SUM(A5:E5),F1))

This would be one array formula version of your formula - it uses OFFSET to pull out each row of the matrix then SUBTOTAL to see if every cell in that row has a number in it. Then it uses SUBTOTAL again to work out the sum of each row and AVERAGE to get the average of rows.

=AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1))),""))

Has to be entered as an array formula using CtrlShiftEnter

Note 1 - some people don't like using OFFSET because it is volatile - you can use matrix multiplication instead but it's arguably less easy to understand.

Note 2 - I used "" instead of referring to an empty cell. Interesting that the non-array formula needed an actual blank cell but the array formula needed an empty string.

You can omit the empty string

=AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))))
Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
1

Basically, what you're describing here for your desired result is the =AVERAGEA() function

The Microsoft Excel AVERAGEA function returns the average (arithmetic mean) of the numbers provided. The AVERAGEA function is different from the AVERAGE function in that it treats TRUE as a value of 1 and FALSE as a value of 0.

With that in mind, the formula should look like this.

=SUM(AVERAGEA(A1:A4),AVERAGEA(B1:B4),AVERAGE(C1:C4),AVERAGEA(D1:D4),AVERAGEA(E1:E4))

Produces the expected result:

enter image description here


Note, if you want to ROUND() the result to two digits, add the following formula to it:

=ROUND(SUM(AVERAGEA(A1:A4),AVERAGEA(B1:B4),AVERAGE(C1:C4),AVERAGEA(D1:D4),AVERAGEA(E1:E4)), 2)
Samuel Hulla
  • 6,617
  • 7
  • 36
  • 70
  • Thank you for taking the time to do this, but I am limited to averaging horizontally not vertically. This is because some rows include partially missing data (e.g., see row 5 in the example above), and I would like to ignore these rows entirely. – PyjamaNinja Jul 24 '18 at 10:48
  • @PyjamaNinja I don't understand. This is the equivalent to ignoring, it even returns your desired result. – Samuel Hulla Jul 24 '18 at 12:10
  • Apologies - perhaps my description is still not clear enough so I will try to demonstrate. If you add a '2' to cell B3 in the spreadsheet you pasted, the result changes to 2.17 right? I want to ignore a row even if it includes some values. I only want to analyze full rows. I still +1'd your answer because it was helpful and informative. Hopefully that clarifies things. PJnin – PyjamaNinja Jul 24 '18 at 12:14