I want to sort through a Dataframe of about 400k rows, with 4 columns, taking out roughly half of them with an if statement:
for a in range (0, howmanytimestorunthrough):
if ('Primary' not in DataFrameexample[a]):
#take out row
So far I've been testing either one of the 4 below:
newdf.append(emptyline,)
nefdf.at[b,'column1'] = DataFrameexample.at[a,'column1']
nefdf.at[b,'column2'] = DataFrameexample.at[a,'column2']
nefdf.at[b,'column3'] = DataFrameexample.at[a,'column3']
nefdf.at[b,'column4'] = DataFrameexample.at[a,'column4']
b = b + 1
or the same with .loc
newdf.append(emptyline,)
nefdf.loc[b,:] = DataFrameexample.loc[a,:]
b = b + 1
or changing the if (not in) to an if (in) and using:
DataFrameexample = DataFrameexample.drop([k])
or trying to set emptyline to have values, and then append it:
notemptyline = pd.Series(DataFrameexample.loc[a,:].values, index = ['column1', 'column2', ...)
newdf.append(notemptyline, ignore_index=True)
So from what I've managed to test so far, they all seem to work ok on a small number of rows (2000), but once I start getting a lot more rows they take exponentially longer. .at seems slighly faster than .loc even if I need it to run 4 times, but still gets slow (10 times the rows, takes longer than 10 times). .drop I think tries to copy the dataframe each time, so really doesn't work? I can't seem to get .append(notemptyline) to work properly, it just replaces index 0 over and over again.
I know there must be an efficient way of doing this, I just can't seem to quite get there. Any help?