is "3 + 0.5 * x" a real fit approach in probability theory

Question

I am learning about Anscombe's quartet through this video.

I am trying to draw Anscombe's quartet with Python.

matplotlib doc provide an approach to do this.

def fit(x):
    return 3 + 0.5 * x

this function named fit, is it a real fit approach in probability theory?

kabanus · Accepted Answer · 2019-04-28T05:55:14.753

The point of the quartet is to show very differing distributions yielding the same statistics, and thus the best linear fit. To verify this:

>>> import pandas as pd
>>> from scipy.optimize import curve_fit
>>> def tofit(x,a,b): return a*x+b
>>> df = pd.read_csv('bla.dat',sep=' ')
>>> df
      x0     y0    x1    y1    x2     y2    x3     y3
0   10.0   8.04  10.0  9.14  10.0   7.46   8.0   6.58
1    8.0   6.95   8.0  8.14   8.0   6.77   8.0   5.76
2   13.0   7.58  13.0  8.74  13.0  12.74   8.0   7.71
3    9.0   8.81   9.0  8.77   9.0   7.11   8.0   8.84
4   11.0   8.33  11.0  9.26  11.0   7.81   8.0   8.47
5   14.0   9.96  14.0  8.10  14.0   8.84   8.0   7.04
6    6.0   7.24   6.0  6.13   6.0   6.08   8.0   5.25
7    4.0   4.26   4.0  3.10   4.0   5.39  19.0  12.50
8   12.0  10.84  12.0  9.13  12.0   8.15   8.0   5.56
9    7.0   4.82   7.0  7.26   7.0   6.42   8.0   7.91
10   5.0   5.68   5.0  4.74   5.0   5.73   8.0   6.89
>>> for i in range(4): curve_fit(tofit,df['x%d'%i],df['y%d'%i])[0]
... 
array([0.50009091, 3.00009091])
array([0.5       , 3.00090909])
array([0.49972727, 3.00245453])
array([0.49990909, 3.00172727])

The four different arrays are the slope and intercept fitted for each data set. All of them are quite nearly identical to what you saw in the tutorial, 0.5x+3.

As you can see, all fits yield almost the exact same function, but you can see from their plots they are quite different, though the fit itself being bad to begin with - this is a warning against blind global fits ubiquitous in our age. It's better to understand something about the distribution intuitively rather then just fitting and saying - oh look, all my sets are the same.

is "3 + 0.5 * x" a real fit approach in probability theory

1 Answers1