I have a csv which is too large to read into memory so I am attempting to use Dask to solve my problem. I am a regular Pandas user but lack experience with Dask. In my data I have a column "MONTHSTART" which I wish to interact with as a datetime object. However I cannot seem to get an output from the Dask dataframe despite my code working in a Pandas example.
I have read my csv in using dask
df = dd.read_csv(filename, dtype='str')
Then I am converting the dtype of the column to a datetime object
def parse_dates(df):
return pd.to_datetime(df['MONTHSTART'], format='%d%b%Y')
meta = ('time', pd.Timestamp)
df.map_partitions(parse_dates, meta=meta)
Finally I am trying to apply a lambda function to create a new column based on my datetime column
df['MONTHS_AGO'] = df.apply(
lambda y: (dt.date.today().year - y['MONTHSTART'].dt.year) * 12 +
(dt.date.today().month - y['MONTHSTART'].dt.month),
axis=1,
meta=meta)
I am unsure about the use of meta here, without it my code failed asking me to specify a meta.
Without meta I get
ValueError: Metadata inference failed, please provide `meta` keyword
With meta I get
AttributeError: ("'str' object has no attribute 'dt'", 'occurred at index 0')
Am I approaching this problem incorrectly? Is there a trick to applying lambda functions in Dask that I am missing?
Edit: I have obfuscated this information, and dropped lots of columns. I have tried to keep enough that the issue can be solved. df.head(2).to_dict example:
{'AGE_1': {0: '57', 1: '57'},
'APREM': {0: '347.08581006', 1: '347.08581006'},
'BUSINESS_1': {0: 'COMPUTERSERVICES', 1: 'COMPUTERSERVICES'},
'COMPULSORYEXCESSAD': {0: '0', 1: '0'},
'COVERTYPE': {0: 'Comprehensive', 1: 'Comprehensive'},
'DRIVINGRESTRICTION': {0: 'IOD', 1: 'IOD'},
'EARNEDTECH': {0: '35.438383793', 1: '15.356632977'},
'ENDDATE': {0: '13AUG2017', 1: '13AUG2017'},
'EXPMONTH': {0: 'EVY01APR2017', 1: 'EVY01AUG2017'},
'INFORCEATEOM': {0: '1', 1: '0'},
'LICENCETYPE_1': {0: 'FullUKCarLicence', 1: 'FullUKCarLicence'},
'MARITALSTATUS_1': {0: 'Partnered', 1: 'Partnered'},
'MILEAGERESTRICTION': {0: '8000', 1: '8000'},
'MIN_AGE': {0: '57', 1: '57'},
'MIN_EXP': {0: '18', 1: '18'},
'MIN_EXP_AGE': {0: '57', 1: '57'},
'MIN_EXP_LICENCETYPE': {0: 'FullUKCarLicence', 1: 'FullUKCarLicence'},
'MONTHEND': {0: '30APR2017', 1: '31AUG2017'},
'MONTHSTART': {0: '01APR2017', 1: '01AUG2017'},
'REGION': {0: 'East Anglia', 1: 'East Anglia'},
'STARTDATE': {0: '16FEB2017', 1: '16FEB2017'},
'TENURE': {0: '4th Renewal', 1: '4th Renewal'},
'TotalIncurredExclRI': {0: nan, 1: nan},
'VEHICLECOUNT': {0: '1', 1: '1'},
'VEHICLEKEPTOVERNIGHT': {0: 'Drive', 1: 'Drive'},
'VEHICLEMODIFICATION': {0: 'false', 1: 'false'},
'VEHICLENUMBER': {0: '1', 1: '1'},
'VEHICLEUSAGE': {0: 'Personal Business Use', 1: 'Personal Business Use'},
'VOLUNTARYEXCESS': {0: '250', 1: '250'}}