'sub' operator not supported Dask_cudf

Question

I came here due a question that surged while I'm following the tutorial's methodology https://docs.rapids.ai/api/cudf/nightly/user_guide/10min.html.

I have a dataframe imported as csv with the following structure:

x_tick.head()

LocalTime Ask Bid Spread
0 2004-10-25 00:01:01.975 86.837 86.877 0.04
1 2004-10-25 00:01:19.300 86.791 86.891 0.10
2 2004-10-25 00:01:30.759 86.812 86.842 0.03
3 2004-10-25 00:01:41.798 86.801 86.831 0.03
4 2004-10-25 00:01:42.213 86.794 86.824 0.03

x_tick.dtypes

LocalTime datetime64[ns]
Ask float64
Bid float64
Spread float64
dtype: object

My goal would be to perform statistical tests and graph analysis, but by following the Rapids Tutorial :

x_tick.Spread.mean().compute()

TypeError: 'sub' operator not supported between <class 'cudf.core.column.numerical.NumericalColumn'> and <class 'cudf.core.column.string.StringColumn'>

What's happening?

Thank you

There isn't enough information provided for the community to support you effectively. It's possible that one of the partitions in your `Spread` column may unexpectedly be string type, rather than float. As a starting point, you may want to try `x_tick.Spread.map_partitions(lambda x: x.dtype).compute().unique()` to investigate the dtype of every partition. [I recommend creating a minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). — Nick Becker, May 11 '22 at 12:59
Another option is to enforce numeric dtype using `dd.to_numeric` (see [this blog post](https://coiled.io/blog/dask-dataframe-to-numeric-object-to-float/)). Of course, the source of the non-numeric values should be investigated for more thorough analysis. — SultanOrazbayev, May 11 '22 at 15:28

'sub' operator not supported Dask_cudf

0 Answers0