I have a data frame as shown in Image, what I want to do is to take the mean along the column 'trial'. It for every subject
, condition
and sample
(when all these three columns has value one), take average of data along column trial (100 rows).
what I have done in pandas is as following
sub_erp_pd= pd.DataFrame()
for j in range(1,4):
sub_c=subp[subp['condition']==j]
for i in range(1,3073):
sub_erp_pd=sub_erp_pd.append(sub_c[sub_c['sample']==i].mean(),ignore_index=True)
But this take alot of time.. So i am thinking to use dask instead of Pandas. But in dask i am having issue in creating an empty data frame. Like we create an empty data frame in pandas and append data to it.
as suggested by @edesz I made changes in my approach
EDIT
%%time
sub_erp=pd.DataFrame()
for subno in progressbar.progressbar(range(1,82)):
try:
sub=pd.read_csv('../input/data/{}.csv'.format(subno,subno),header=None)
except:
sub=pd.read_csv('../input/data/{}.csv'.format(subno,subno),header=None)
sub_erp=sub_erp.append(sub.groupby(['condition','sample'], as_index=False).mean())
Reading a file using pandas take 13.6 seconds while reading a file using dask take 61.3 ms. But in dask, I am having trouble in appending.
NOTE - The original question was titled Create an empty dask dataframe and append values to it.