1

I am trying to fit a curve to a set of data points but would like to preserve certain characteristics.

Like in this graph I have curves that almost end up being linear and some of them are not. I need a functional form to interpolate between the given data points or past the last given point.

The curves have been created using a simple regression

def func(x, d, b, c):
    return c + b * np.sqrt(x) + d * x

enter image description here

My question now is what is the best approach to ensure a positive slope past the last data point(s) ??? In my application a decrease in costs while increasing the volume doesn't make sense even if the data says so.

I would like to keep the order as low as possible maybe ˆ3 would still be fine.

The data used to create the curve with the negative slope is

x_data = [     100,      560,      791,     1117,     1576,     2225,
       3141,     4434,     6258,     8834,    12470,    17603,
      24848,    35075,    49511,    69889,    98654,   139258,
     196573,   277479,   391684,   552893,   780453,  1101672,
    1555099,  2195148,  3098628,  4373963,  6174201,  8715381,
   12302462, 17365915]
y_data = [  7,   8,   9,  10,  11,  12,  14,  16,  21,  27,  32,  30,  31,
    38,  49,  65,  86, 108, 130, 156, 183, 211, 240, 272, 307, 346,
   389, 436, 490, 549, 473, 536]

And for the positive one

x_data = [     100,      653,      950,     1383,     2013,     2930,
       4265,     6207,     9034,    13148,    19136,    27851,
      40535,    58996,    85865,   124969,   181884,   264718,
     385277,   560741,   816117,  1187796,  1728748,  2516062,
    3661939,  5329675,  7756940, 11289641, 16431220, 23914400,
   34805603, 50656927]
y_data = [  6,   6,   7,   7,   8,   8,   9,  10,  11,  12,  14,  16,  18,
    21,  25,  29,  35,  42,  50,  60,  72,  87, 105, 128, 156, 190,
   232, 284, 347, 426, 522, 640]

The curve fitting is simple done by using

popt, pcov = curve_fit(func, x_data, y_data)

For the plot

plt.plot(xdata, func(xdata, *popt), 'g--', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.plot(x_data, y_data, 'ro')
plt.xlabel('Volume')
plt.ylabel('Costs')
plt.show()
ThatQuantDude
  • 759
  • 1
  • 9
  • 26

1 Answers1

1

A simple solution might just look like this:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import least_squares

def fit_function(x, a, b, c, d):
    return a**2 + b**2 * x + c**2 * abs(x)**d 

def residuals( params, xData, yData):
    diff = [ fit_function(x, *params ) - y for x, y in zip( xData, yData ) ]
    return diff

fit1 = least_squares( residuals, [ .1, .1, .1, .5 ], loss='soft_l1', args=( x1Data, y1Data ) )
print fit1.x
fit2 = least_squares( residuals, [ .1, .1, .1, .5 ], loss='soft_l1', args=( x2Data, y2Data ) )
print fit2.x

testX1 = np.linspace(0, 1.1 * max( x1Data ), 100 )
testX2 = np.linspace(0, 1.1 * max( x2Data ), 100 )
testY1 = [ fit_function( x, *( fit1.x ) ) for x in testX1 ]
testY2 = [ fit_function( x, *( fit2.x ) ) for x in testX2 ]

fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.scatter( x1Data, y1Data )
ax.scatter( x2Data, y2Data )
ax.plot( testX1, testY1 )
ax.plot( testX2, testY2 )
plt.show()

providing

>>[ 1.00232004e-01 -1.10838455e-04  2.50434266e-01  5.73214256e-01]
>>[ 1.00104293e-01 -2.57749592e-05  1.83726191e-01  5.55926678e-01]

and

soft fit

It just takes the parameters as squares, therefore ensuring positive slope. Naturally, the fit becomes worse if following the decreasing points at the end of data set 1 is forbidden. Concerning this I'd say those are just statistical outliers. Therefore, I used least_squares, which can deal with this with a soft loss. See this doc for details. Depending on how the real data set is, I'd think about removing them. Finally, I'd expect that zero volume produces zero costs, so the constant term in the fit function doesn't seem to make sense.

So if the function is only of type a**2 * x + b**2 * sqrt(x) it look like:

simplified

where the green graph is the result of leastsq, i.e. without the f_scale option of least_squares.

mikuszefski
  • 3,943
  • 1
  • 25
  • 38