0

I am relatively new to Dask and have a large file 12GB that I wish to process. This file was imported from a SQL BCP file that I want to wrangle with Dask prior to uploading to sql. As part of this, I need to remove some proceeding whitespace e.g. ' SQL Tutorial’ changed to 'SQL Tutorial'. I would do this using pandas as follows:

df_train['colum1'] = pd.core.strings.str_strip(df_train['column1'])

dask doesn't seem to have this feature as I get the error

AttributeError: module 'dask.dataframe.core' has no attribute 'strings'

Is there a memory-efficient way to do this using dask?

Sql_Pete_Belfast
  • 570
  • 4
  • 23

1 Answers1

1

After a long searching I find it in dask API:

str
Namespace for string methods

So you can use:

df_train['colum1'] = df_train['column1'].str.strip()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252