I have a huge file, around 35GB stored in form of hdf5. I have to do certain calculations on some specific columns and want to insert those calculations as new columns. I know I can assign new columns directly as
df['new_column'] = 0(or some other value)
.
But I have some calculations in which I have to use previous row value. In pandas, we can use iloc
function to get the value of the previous index. But, pandas cannot handle this much big file. I got memory error lot of the time trying this.
So how can I implement some function that can use the value from the previous row and can do calculations for me in dask? or in other words how can I implement an alternative to iloc
method? I know how to use df.apply
function.
The code with implementation will be appreciated. Thank you.