4

I have 200k data points and I'm trying to obtain derivative of fitted polynomial. I divided my data set into smaller ones every 0.5 K, the data is Voltage vs Temperature. My code roughly looks like this:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
testset=pd.read_csv('150615H0.csv',sep='\t')
x=np.linspace(1,220,219)
ub=min(testset['T(K)'])
lb=min(testset['T(K)'])-1
q={i:testset[(testset['T(K)'] < ub+i) & (testset['T(K)'] > lb+i)] for i in x}
f={j:np.polyfit(q[j]['T(K)'],q[j]['Vol(V)'],4) for j in q}
fs={k:np.poly1d(f[k]) for k in f}
fsd={l:np.polyder(fs[l],1) for l in fs}
for kk in q:
    plt.plot(q[kk]['T(K)'],fsd[kk](q[kk]['T(K)']),color='blue',linewidth=2,label='fit')

Unsurprinsingly, the derivative is discontinuous and I don't like it. Is there any other way to fit polynomial locally and get continuous derivative at the same time ?

rth
  • 10,680
  • 7
  • 53
  • 77
GPM
  • 105
  • 5

1 Answers1

3

Have a look at the Savitzky-Gollay filter for an efficient local polynomial fitting.

It is implemented, for instance, in scipy.signal.savgol_filter. The derivative of the fitted polynomial can be obtained with the deriv=1 argument.

rth
  • 10,680
  • 7
  • 53
  • 77
  • I may be mistaken, but as far as I understand Savitzky-Golay filter you need to have mathematical model of your polynomial like in the first example in the cookbook : http://wiki.scipy.org/Cookbook/SavitzkyGolay, as you can add 1 dimensional array as an argument. My problem is slightly different, i don't have any clue what should be global polynomial, i need local fit with precise determination of derivative. – GPM Jul 03 '15 at 21:54
  • @GPM Nope, you don't need to know anything about the polynomial, and it is indeed a local fit. For instance, say `y` is your dataset containing 1D array. Simply calling `savitzky_golay(y, window_size=6, order=3, deriv=1)` would give you the derivative of the 3rd order polynomials locally fitted to your data over a moving window of 6 points. You might need to adapt a bit the derivative since this algorithm assumes uniform sampling, and take only `y` input, not `x` and `y`, but I'm fairly certain that this is what you are looking for. – rth Jul 03 '15 at 22:12