11

Excel produces scatter diagrams for sets of pair values. It also gives the option of producing a best fit trendline and formula for the trendline. It also produces bubble diagrams which take into consideration a weight provided with each value. However, the weight has no influence on the trendline or formula. Here is an example set of values, with their mappings and weights.

    Value Map     Weight
    0       1      10
    1       2      10
    2       5      10
    3       5      20
    4       6      20
    5       1      1

With Excel's trendline, the mapping for value 5 has too much influence on the formula. Is there any way to produce a formula that reflects the respective weights?

As a help, I've introduced a weighted average for five consecutive values. But is their a better way?

Tams
  • 182
  • 1
  • 1
  • 8
  • 1
    There's something [here](http://www.pcreview.co.uk/forums/weighted-trendline-t3837585.html) that looks like it might work for you. – andy holaday Jun 18 '12 at 23:06
  • 1
    Addendum: The Function I pointed to in my previous comment does work but the outputs are reversed. They are actually {intercept, slope}. – andy holaday Jun 18 '12 at 23:33
  • Thanks Andy. I may be misunderstanding this, but for the above example, the function returns two values (slope 0.6659 and intercept -0.35825). Does this mean that the equation should be y = 0.6659x - 0.35825? If so, then the results are way out. I must be missing something... – Tams Jun 19 '12 at 02:01
  • Hmm, when I tested with your data I got slope 1.135 and intercept 1.441. Note the parameter order is (Y,X,Weight) (not X,Y,weight) (^: – andy holaday Jun 19 '12 at 03:19
  • @Tams: as Andy suggests it looks like your result is based on plotting the data with Value on the Y axis and Map on the X axis. Swapping columns A and B and applying one of the methods below gives the same results. – lori_m Jun 19 '12 at 07:44
  • 1
    @andy and lori - thanks, I did reverse my x and y values. All three answers from you and brettdj are excellent but each has its disadvantage and advantage. Expanding the data as Brett suggests would help solve polynomial equations. But it's slightly impractical for my real world data which has weights in tens of thousands as well as several in single figures. I'd also have to find some means of creating that artificial data in large volumes without excel complaining about its 32000 row limit for charts. You've certainly given me food for thought though. – Tams Jun 19 '12 at 09:52
  • Is there an equation to calculate the weighted R-squared value? – Tams Jun 19 '12 at 14:51
  • There are several ways to calculate Rsquared. I've added one way which agrees with the trendline shown in the other post. – lori_m Jun 20 '12 at 10:34

2 Answers2

14

With data in A2:C7, based on the standard weighted least squares formula, you can try:

=LINEST(B2:B7*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0)

entered with CTRL+SHIFT+ENTER in E2:F2 or any 2x1 range. This also returns {1.1353,1.4412}.

For Rsquared you can enter:

=INDEX(LINEST((B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0,1),3,1)

Explanation of formula

Consider first a normal regression of y on X using LINEST. If const = TRUE the regressor matrix is the augmented matrix consisting of a column of ones followed by the regressor columns i.e. X'=(1,X). If const = FALSE the regressor matrix is simply X so running the regression with a column of ones included gives the same estimates as running without a column of ones and setting const=TRUE.

Now consider a weighted least squares regression. The regression is now Wy on WX'=(W1,WX) where W is the diagonal matrix consisting of the square root of the weights. Since there is not a column of ones present, we must set const = FALSE and use two columns in the regressor matrix.

Rsquared Calculation

Setting stats to TRUE in the LINEST output of the first formula we get in the third and fifth rows:

SSres = 59.76
SSreg(u) = 1461.24
SSTot(u) = 1521
Rsq(u) = 1 - 59.76/1521 = 0.9607 

Note these values are uncentered versions (u) since the const=FALSE (refer to MS Help on LINEST for further info.) For the centered versions (c) we need to subtract the weighted average as below:

SSTot(c) =SUMPRODUCT(C2:C7*(B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))^2) = 244.93
Rsq(c) = 1 - 59.76/244.93 = 0.756
lori_m
  • 5,487
  • 1
  • 18
  • 29
  • @Tams , you can return the R2 with Lori's approach by return the stats function. Array enter formula `=INDEX(LINEST(B2:B7*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,TRUE,TRUE),3,1)` for a calculated intercept, this one `=INDEX(LINEST(B2:B7*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,FALSE,TRUE),3,1)` for intercept set to 0 (which lori_m used above although I would recommend this only for specific cases where it must be forced) – brettdj Jun 20 '12 at 02:09
  • @brettdj something does not make sense. lori_m's formula clearly does not return intercept=0. – andy holaday Jun 20 '12 at 02:34
  • @andyholaday. agree it's a little confusing, I've added further explanation, partly for my benefit(!) @brettdj: the formula requires `const=FALSE` which means the Rsq value is the uncentered value. – lori_m Jun 20 '12 at 09:56
  • 1
    +1 lori_m. From your edited posted I see that your formula returns the same values as my more complicated effort. Yours is the superior solution. – brettdj Jun 20 '12 at 23:40
2

Update
Based on the additional information that you have tens of thousands of rows, here is a VBA UDF that will do the job (including the r2)

As per the screenshot below it provides the same m , x and r2 values that my expanded data set did in my original answer

enter image description here

Public Function LinestWeighted(xRng As Range, yRng As Range, wRng As Range, bInt As Boolean, bStat As Boolean) As Variant
    Dim x As Variant
    Dim y As Variant
    Dim W As Variant
    Dim TotX As Variant
    Dim TotY As Variant
    Dim lngRow As Long
    Dim strDelim As String
    Dim strX As String
    Dim strY As String
    Dim NewSeries As Variant

    x = Application.Transpose(xRng)
    y = Application.Transpose(yRng)
    W = Application.Transpose(wRng)
    strDelim = ","

    If (UBound(x, 1) = UBound(y, 1)) And (UBound(x, 1) = UBound(W, 1)) Then
        For lngRow = 1 To UBound(W)
            strX = strX & Application.WorksheetFunction.Rept(x(lngRow) & strDelim, W(lngRow))
            strY = strY & Application.WorksheetFunction.Rept(y(lngRow) & strDelim, W(lngRow))
        Next lngRow
        TotX = Split(Left$(strX, Len(strX) - 1), strDelim)
        TotY = Split(Left$(strY, Len(strY) - 1), strDelim)
        ReDim NewSeries(1 To UBound(TotX) + 1, 1 To 2)
        For lngRow = 0 To UBound(TotX)
            NewSeries(lngRow + 1, 1) = CDbl(TotX(lngRow))
            NewSeries(lngRow + 1, 2) = CDbl(TotY(lngRow))
        Next
        With Application
            LinestWeighted = .WorksheetFunction.LinEst(.Index(.Transpose(NewSeries), 2), .Index(.Transpose(NewSeries), 1), bInt, bStat)
        End With
    Else
        LinestWeighted = "input ranges must be equal in length"
        Exit Function
    End If
End Function

Initial Answer

Just expand out your data series by your weighting factors

So rather than try to graph 6 pairs, use the ratio of your highest to lowest to repeat the points

i.e. graph

0       1     `10 times`  
1       2     `10 times`    
...
5       1     `once`    

enter image description here

Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
brettdj
  • 54,857
  • 16
  • 114
  • 177
  • Thanks guys. You certainly know your stuff! – Tams Jun 20 '12 at 21:05
  • @Tams this question should be awarded to lori_m – brettdj Jun 20 '12 at 23:41
  • Have done so. Here's a link to a page where the author attempts to create a polynomial equation. I tried it out but it produces an error. Perhaps it could be fixed and adapted for weights? http://www.tushar-mehta.com/publish_train/data_analysis/16.htm – Tams Jun 21 '12 at 02:09
  • Here's another site that provides an equation for a polynomial. http://www.eng-tips.com/viewthread.cfm?qid=184726. The equation is: Equation: y = (c2 * x^2) + (c1 * x ^1) + b. WHERE c2: =INDEX(LINEST(y,x^{1,2}),1). C1: =INDEX(LINEST(y,x^{1,2}),1,2). b = =INDEX(LINEST(y,x^{1,2}),1,3). – Tams Jun 21 '12 at 04:25
  • @Tams: You can change: `IF({1,0},1,A2:A7)` to `IFERROR(A2:A7^{0,1},1)` in the formulas from my post with the same results. Now just add extra digits to the array for a weighted polynomial fit eg use {0,1,2} for a quadratic. – lori_m Jun 23 '12 at 20:00