How to calculate weighted average for 0 values

Question

I am facing an issue in performing a weighted average logic in excel.
I am looking at 4 fields for different deliverables: Total, Complete, Pending and Weight.

The weighted average for a particular deliverable is calculated as : (Complete/Total) * Weight for example ((5/10) * 0.20) = 10%

For each of the deliverable, I have calculated the % and then added all of the % together.

Deliverable 1 - 10 = 10% + 20% + 5% + .... = 65%

My question is:
if for a particular deliverable, the available field is 'Not Applicable' or '0', then the weighted average for that deliverable will be 0%. There by, the total will be on a lower side. So, is there a better way to represent this? How can I solve this issue?

I believe you should elaborate your question more, i am finding it difficult to understand.What does `the available field is 'Not Applicable' or '0'` mean ? — frunkad, Aug 30 '16 at 08:44
For a few deliverable, there might be no Total or Available volume. so it will be 0. If it is 0, then the weighted average for that deliverable will be 0% which compromises the weighted average aggregation for all deliverables — Rikin, Aug 30 '16 at 08:47

ABrowne · Accepted Answer · 2016-08-31T13:40:20.790

0

Ok, so if I am understanding you correctly you are saying the following:

(complete/total) * weight = weighted_average

Where this is one of many weighted averages that will be summed to provide a total_deliverable_value.

You have a case where this algorithm outputs 0% for a weighted_average due to a data issue. This is falsely lowering the total_deliverable_value.

Options:

add assumed values for missing data (clearly mark them as assumed rather than recorded fact)
Have a standard weight for missing data. For example, all missing data gets 20%
Go source the missing data
Change how you are calculating the total_deliverable_value, i.e

instead of:

total_deliverable_value = sum(weighted_average::weighted_average)

use something like this:

total_deliverable_value = sum(weighted_average::weighted_average) / count_of_data_values

where count_of_data_values is the number of non-missing data values you had.

Updated based on comments

You have to change the formula for all calculations. So take a current 'perfect' scenario of 100% such as 5x weighted_averages where weighted_average = 0.2 (20%)

Your original calculation would give this:

a) 0.2+0.2+0.2+0.2+0.2 = 1 (100%) 
b) 0.2+NA+0.2+0.2+0.2 = 0.8 (80%)

This would indicate that a is not as good as b (lets not argue whether missing data means that is true), however you wish to show that actually they should be considered almost identical.

The new calculation would give:

a) 1/5 => 0.2 (20%) 
b) 0.8/4 => 0.2 (20%)

Therefore both are the same total_deliverable_value. If you want to scale these back up to 100%. Do this:

total_deliverable_value = (sum(weighted_average::weighted_average) / count_of_data_values) * count_of_expect_values

Now we are looking like this:

a) ((0.2+0.2+0.2+0.2+0.2)/5)*5 = 1 (100%) 
b) ((0.2+NA+0.2+0.2+0.2)/4)*5 = 1 (100%)

UPDATE: Handling inconsistent max weighted_averages

The above all works because a it is assumed that all the individual weighted_averages are of a consistent max value. Such as a max 0.2 for 5 values:

[0.2, 0.1, 0.18, 0.08, 0.05]  ==>  61%

Therefore we can afford to lose a few of the data points and still make a calculation that would enable the total_deliverable_value to be compared:

[0.2, 0.1, NA, NA, 0.05]   ==>   58%

However, if the max values are inconsistent, say, the 1st weight_average max was 0.5, we get a break in our result:

[0.4, 0.09, NA, NA, 0.15]   ==>  106%

Where 100% should be out Max total_deliverable_value. There is a simple solution to this. Use the weighted_average_max value in the formula. We are no longer missing 40% of our values, we are missing 20%:

[0.4 (max 0.5), 0.09 (max 0.1), NA (max 0.1), NA (max 0.1), 0.15 (max 0.2)] ==> 80%

compared against something similar with all its data points:

[0.4, 0.09, 0.05, 0.03, 0.15] ==> 72%

This the new formula:

total_deliverable_value = (sum(weighted_average::weighted_average) / portion_of_max_values_existing) * total_of_max_values

edited Aug 31 '16 at 13:40

answered Aug 30 '16 at 09:01

ABrowne

1,574
1
11
21

Thanks, this is very helpful – Rikin Aug 30 '16 at 09:19
Sorry but quick question. count_of_data_values, can you clarify my understanding as stated below Del.1 = 10/20 * 0.1 = 5 Del.2 = 5/10 * 0.1 = 5 Del.3 = 10/10 * 0.1 = 1 Del.4 = 0/0 (not applicable or having 0 values) Del.5 = 0/10 * 0.1 = 0 total_deliverable_value = (5+5+1+0) / 4 (as 4 out of 5 have data points although one data point is 0/10) = 11/4 Is this correct? – Rikin Aug 30 '16 at 10:44
Yes. 11/4 => 2.75. Now if you compared this against a similar deliverable with 5 complete values (assume for now that the missing one equals 3). Then you would have 14/5 => 2.8. – ABrowne Aug 30 '16 at 10:50
Ok, so suppose my weights are evenly assigned for all 5 deliverables to make it 100%, all weights are 0.2. In that case, the total would be 10+10+20+0/4 = 40/4 = 10% which will be quite less than expected. I mean the total goes down by 4 times which I am not sure if it is correct – Rikin Aug 30 '16 at 10:54
@Rikin I have updated the question, hopefully this helps clarify it. – ABrowne Aug 30 '16 at 11:37
This makes complete sense. Thank you. I appreciate your help. – Rikin Aug 31 '16 at 04:40
A, I have one scenario here, D1: (29/30) * 0.2 = 0.19 (19%) D2 to D6 are all 0 so that is total_deliverable_value = ((0.19+0+0+0+0+0)/1)*6 = 1.16 which becomes 116% when converted to percentage. Why is it going more than 100%? – Rikin Aug 31 '16 at 10:45
because the formula we created was based on an assumption that the max each of the weighted_averages was equal. i.e. if 3 deliverables made up 100% then each of the weighted_averages would have been capped at 33%. I have updated the answer to provide a solution to this. – ABrowne Aug 31 '16 at 13:17
A, Sorry I did not understand your solution clearly. total_deliverable_value = (sum(weighted_average::weighted_average) / portion_of_max_values_existing) * total_of_max_values Can you help me with the below example: Max weight D1 - 0.35 D2 - 0.2 D3 - 0.15 D4 - 0.05 D5 - 0.15 D6 - 0.1 Total is 1.0 that is 100% Available values are D1 - 29/30 D2 - 0 D3 - 0 D4 - 0 D5 - 0 D6 - 0 I am not sure how the weighted_average, portion_of_max_values_existing and total_of_max are computed. Sorry to bother. – Rikin Sep 01 '16 at 06:14
Assuming only D1 is available. Therefore the portion_of_max_values_existing is 35. the portion_of_max_values_existing is 100. (((29/30)*0.35)/35)*100 = 96%. The formula essentially works out the average quality of the value and assumes the rest are the same. Therefore, if only 1 value exists, the formula could be simplified to 29/30. If we now assume you had D2 value for the above of 23/41, this would be result: (( ((29/30)*0.35) + ((23/41)*0.2) ) /55)*100 = 81%. Obviously with the fewer values available the accuracy in the result is reduced due to the increase in assumed values – ABrowne Sep 02 '16 at 08:04
A, I need to provide a short blurb for the above calculation that we encountered for business reference purposes to my lead. Can you kindly help provide a short brief note on explaining the methodology for the same? – Rikin Oct 06 '16 at 18:50
The formula calculates the weighted average for your products based on the best available information. Where information is incomplete, the formula estimates the missing values based on the information available to enable the weighted average to be comparable to others. Results using missing values should present a +/- error guide equal to the effect of the missing values being either 0 or 1. – ABrowne Oct 07 '16 at 12:46
Thank you! Appreciate your help. – Rikin Oct 10 '16 at 02:05
A, I have a scenario. Di = Available/Total * Weight D1 = 40/100*0.35 = 14 D2 = 0/20*0.2 = 0 D3 = 0/0*0.15 (NA) so will exclude D4 = 0/50*0.05 = 0 D5 = 0/0*0.15 (NA) so will exclude D6 = 0/10*0.1 = 0 Based on the formula above, it should be ((14+0+NA+0+NA+0)/70)*100 = 20% How can the total % be more than 14% given that only D1 is 14% and everything else is 0? Shouldn't the value be 14% in that case? – Rikin Nov 01 '16 at 12:22
A, Sorry to bother. Can you kindly guide me through the above scenario? – Rikin Nov 02 '16 at 00:57
No. It is assumed be higher than 14%. You are basing the answer on an approximation of the truth. Have you ever heard about the concept of a cat inside a sound proof opaque box. Is the cat dead or alive? Until you have more input, such as smell, amount of time it has been in there, the only fair assumption is that there is equal chance of being alive of dead. So the weighted outcome would be 50%. – ABrowne Nov 02 '16 at 09:42
The more inputs you have that total the whole picture, and more accurate the result is. As mentioned in my answer, it is appropriate to represent a +/- accuracy next to your outcomes. It helps with the overall comparison. For example (a) 50% +- 20% is not equal to (b) 50% +- 0%. (a) quality is 'similar' but not the 'same' to (b). – ABrowne Nov 02 '16 at 09:45

How to calculate weighted average for 0 values

1 Answers1

Linked