How to implement `iloc` function for dask dataframe?

Question

I have a huge file, around 35GB stored in form of hdf5. I have to do certain calculations on some specific columns and want to insert those calculations as new columns. I know I can assign new columns directly as

df['new_column'] = 0(or some other value). But I have some calculations in which I have to use previous row value. In pandas, we can use iloc function to get the value of the previous index. But, pandas cannot handle this much big file. I got memory error lot of the time trying this.

So how can I implement some function that can use the value from the previous row and can do calculations for me in dask? or in other words how can I implement an alternative to iloc method? I know how to use df.apply function.

The code with implementation will be appreciated. Thank you.

I don't know Dask. I am going straight to Spark. This sounds hard in Spark too, but bet someone has figured it out already. — Chad Bernier, Aug 02 '18 at 02:08

score 1 · Answer 1 · answered Aug 04 '18 at 13:46

1

Dask.dataframe does not implement iloc.

You might be interested in rolling instead

df.rolling(window=1).apply(...)

answered Aug 04 '18 at 13:46

MRocklin

55,641
23
163
235

How to implement `iloc` function for dask dataframe?

1 Answers1