I have some code and do not understand why applying np.std delivers two different results.
import numpy as np
import pandas as pd
a = np.array([ 1.5, 6. , 7. , 4.5])
print 'mean value is:', a.mean()
print 'standard deviation is:', np.std(a)
Next lines should basically do the same just in a pandas dataframe
base = datetime.datetime(2000, 1, 1)
arr = np.array([base + datetime.timedelta(days=i) for i in xrange(4)])
index_date = pd.Index(arr, name = 'dates')
data_gas = pd.DataFrame(a, index_date, columns=['value'], dtype=float)
mean_pandas = data_gas.resample('M').mean()
standard_deviation = data_gas.resample('M').apply(np.std)
print mean_pandas
print standard_deviation
From the documentation of np.std I can read: "...By default ddof
is zero." (ddof=delta degrees of freedom).
np.std(a)
delivers the standard deviation where the divisor is N (=number of values), ...resample('M').apply(np.std)
delivers the standard deviation where the divisor is N minus 1. What causes this difference?