Say I already have a PDF (probability density function) in Pandas DataFrame.
import pandas as pd
import numpy as np
from scipy import stats
df = pd.DataFrame([1,2,3,4,5,6,5,4,3,2], index=np.linspace(21,30,10), columns=['days'])
df.index.names=['temperature']
print(df)
days
temperature
21.0 1
22.0 2
23.0 3
24.0 4
25.0 5
26.0 6
27.0 5
28.0 4
29.0 3
30.0 2
If I wanted to calculate metrics like skewness, I have to convert the PDF back to raw data like this:
temp_history = []
for i in df.iterrows():
temp_history += i[1][0] * [i[0]]
print(temp_history)
[21.0, 22.0, 22.0, 23.0, 23.0, 23.0, 24.0, 24.0, 24.0, 24.0, 25.0, 25.0, 25.0, 25.0, 25.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 27.0, 27.0, 27.0, 27.0, 27.0, 28.0, 28.0, 28.0, 28.0, 29.0, 29.0, 29.0, 30.0, 30.0]
skew = stats.skew(temp_history)
Is there anyway I can calculate the metrics without having to create temp_history
? Thanks!
Edit: The reason I want to avoid creating a raw data in any form is that I don't want to lose a huge chunk of memory simply when the numbers in the days
column get bigger.