0

I came here due a question that surged while I'm following the tutorial's methodology https://docs.rapids.ai/api/cudf/nightly/user_guide/10min.html.

I have a dataframe imported as csv with the following structure:

x_tick.head()
  • LocalTime Ask Bid Spread
  • 0 2004-10-25 00:01:01.975 86.837 86.877 0.04
  • 1 2004-10-25 00:01:19.300 86.791 86.891 0.10
  • 2 2004-10-25 00:01:30.759 86.812 86.842 0.03
  • 3 2004-10-25 00:01:41.798 86.801 86.831 0.03
  • 4 2004-10-25 00:01:42.213 86.794 86.824 0.03

x_tick.dtypes

  • LocalTime datetime64[ns]
  • Ask float64
  • Bid float64
  • Spread float64
  • dtype: object

My goal would be to perform statistical tests and graph analysis, but by following the Rapids Tutorial :

x_tick.Spread.mean().compute()

TypeError: 'sub' operator not supported between <class 'cudf.core.column.numerical.NumericalColumn'> and <class 'cudf.core.column.string.StringColumn'>

What's happening?

Thank you

jack
  • 13
  • 3
  • 3
    There isn't enough information provided for the community to support you effectively. It's possible that one of the partitions in your `Spread` column may unexpectedly be string type, rather than float. As a starting point, you may want to try `x_tick.Spread.map_partitions(lambda x: x.dtype).compute().unique()` to investigate the dtype of every partition. [I recommend creating a minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Nick Becker May 11 '22 at 12:59
  • Another option is to enforce numeric dtype using `dd.to_numeric` (see [this blog post](https://coiled.io/blog/dask-dataframe-to-numeric-object-to-float/)). Of course, the source of the non-numeric values should be investigated for more thorough analysis. – SultanOrazbayev May 11 '22 at 15:28

0 Answers0