1

Pandas:

data = data.dropna(axis = 'columns')

I am trying to do something similar using a cudf dataframe but the apis don't offer this functionality.

My solution is to convert to a pandas df, do the above command, then re-convert to a cudf. Is there a better solution?

Sterls
  • 723
  • 12
  • 22
  • 2
    Apparently that feature is coming in a future release. That said, you can see the proposed method to accomplish it in the [github repo](https://github.com/rapidsai/cudf/pull/1126/files/b56bad7a4275189f556d1cb69b52879d94b1595b), and you may be able to repurpose this as a regular function instead of a class method to achieve the desired result – G. Anderson May 30 '19 at 17:09

2 Answers2

2

cuDF now supports column based dropna, so the following will work:

import cudf
​
df = cudf.DataFrame({'a':[0,1,None], 'b':[None,0,2], 'c':[1,2,3]})
print(df)
      a     b  c
0     0  null  1
1     1     0  2
2  null     2  3
df.dropna(axis='columns')
    c
0   1
1   2
2   3
Nick Becker
  • 4,059
  • 13
  • 19
1

Until dropna is implemented, you can check the null_count of each column and drop the ones with null_count>0.

Thomson Comer
  • 3,919
  • 3
  • 30
  • 32