I have this data that I fit a linear function to and the fit determines other work (never mind, not important). I'm using numpy.polyfit
, and when I simply include the data and the degree of the fit, nothing else, it produces this plot:
Now, the fit is okay, but the general consensus is the line of best fit is being skewed by those red data points above it and I should actually be fitting to the data just below it which forms a nice linear shape (beginning around that congested blob of blue points). So I attempted to add a weighting to my call to polyfit
, and I chose an arbitrary weighting of 1/sqrt(y-values), so basically the smaller y-values will be weighted towards more favourably. This gave the following:
Which admittedly is better but I'm still unsatisfied, as now it appears the line is too low. I would ideally like a middle-ground, but since I chose really an arbitrary weighting, I was wondering if in general there is a way to perform a more robust fit using Python, or even if this can be done using polyfit
? Using a separate package if it works will be fine too.