Stripping whtiespace from a dask dataframe colunn

Question

I am relatively new to Dask and have a large file 12GB that I wish to process. This file was imported from a SQL BCP file that I want to wrangle with Dask prior to uploading to sql. As part of this, I need to remove some proceeding whitespace e.g. ' SQL Tutorial’ changed to 'SQL Tutorial'. I would do this using pandas as follows:

df_train['colum1'] = pd.core.strings.str_strip(df_train['column1'])

dask doesn't seem to have this feature as I get the error

AttributeError: module 'dask.dataframe.core' has no attribute 'strings'

Is there a memory-efficient way to do this using dask?

How working `df_train['colum1'] = df_train['column1'].str.strip()` ? — jezrael, Sep 22 '20 at 11:40
maybe i'm wrong but i thought that doesn't always work and can have leave NaN values. — Sql_Pete_Belfast, Sep 22 '20 at 11:47
Not sure if working in dask, is possible use `sep='\s*,\s*'` like [here](https://stackoverflow.com/a/35781099) ? — jezrael, Sep 22 '20 at 12:10

score 1 · Accepted Answer · answered Sep 22 '20 at 11:53

1

After a long searching I find it in dask API:

str
Namespace for string methods

So you can use:

df_train['colum1'] = df_train['column1'].str.strip()

answered Sep 22 '20 at 11:53

jezrael

822,522
95
1,334
1,252

Stripping whtiespace from a dask dataframe colunn

1 Answers1