-1

I am having an issue where the logarithmic function is behaving differently depending on the type of graph I use with the same data. When I generate the equation by hand, it returns the scatterplot linear trendline, but the slope function and linear graph produce a different trendline.

Linear vs Scatter Linear vs Scatter The equation for the scatter plot logarithmic line is:

y = -0.079ln(x) + 0.424

The equation for the linear plot trenline is:

y = -0.052ln(x) + 0.3138

I can generate the linear plot trenline slope using this equation:

=SLOPE(B2:B64,LN(A2:A64)) = -0.052

But using the general slope equation, I get the scatter plot trendline (using SQL):

    SELECT SUM(multipliedresiduals) / SUM(xresidsquared)
FROM (
    SELECT *
        ,log(x.x) - l.avgx xresiduals
        ,x.y - l.avgy yresiduals
        ,power(log(x.x) - l.avgx, 2) xresidsquared
        ,((log(x.x) - l.avgx) * (x.y - l.avgy)) multipliedresiduals
    FROM ##logtest x
    CROSS JOIN (
        SELECT avg(log(x)) avgx
            ,avg(y) avgy
        FROM ##logtest l
        ) l
    ) z        =   -0.0789746757495071 (Scatter Plot Slope)

What's going on? I'm mainly interested in replicating the linear plot trenline equation in SQL

Here is the data:

https://docs.google.com/spreadsheets/d/1sOlyXaHnUcCuD9J28cKHnrhhcr2hvYSU1iCNWXcTqEA/edit?usp=sharing

Here is the Excel File:

https://www.dropbox.com/s/6hpd4bzvmbxe5pu/ScatterLinearTest.xlsx?dl=0

TigermanSD
  • 77
  • 1
  • 2
  • 7

1 Answers1

1

Line and Scatter graphs in Excel are quite different with regard to the X-axis. In the case of a scatter graph, the x-axis represents actual values. In the case of a line graph, the x-axis are labels. If you try to compute a slope, with a line graph, the x-axis will have the values of 1,2,3,4, ... no matter what the label shows (e.g: even if it shows 7..69). With a scatter graph, the x-axis will have the value of the label.

In your case, the difference between the two slopes can be explained by the x-axis line graph values starting at 1 (even though it is labelled 7); and the x-axis scatter graph values starting at 7 -- the actual value.

So, in fact, the real slope for the the data you present, with "X" starting at a value of "7", is the slope you get from the scatter graph data, which is the same as you are getting in your SQL.

In order for the SQL equation to match the linear plot trendline equation, you would need to replace the x-axis values with a series [1..n] instead of the actual x-axis values.

I don't have SQL, but the results of these two SLOPE formulas should clarify what I am writing:

Scatter plot: =SLOPE(B2:B64,LN(ROW(INDIRECT("7:69"))))  -0.078974676
Line Plot:    =SLOPE(B2:B64,LN(ROW(INDIRECT("1:63"))))  -0.051735504

The first is the Scatter plot, the second is the Line plot

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60