I'm using pandas .astype()
to cast a dict of column names to their correct dtypes. It works for str
, int
, datetime64[ns]
, and float
but is failing on timedelta64[ns]
. When I run this I get ValueError: Could not convert object to NumPy timedelta.
import pandas as pd
import numpy as np
sample_row = pd.DataFrame([['g1',
3912841,
'2018-09-29 16:03:49',
4.040196e+09,
'1 days 15:49:38']],
columns=['group',
'job_number',
'submission_time',
'maxvmem',
'wait_time'])
sample_row = (sample_row.astype(dtype={'group':'str',
'job_number':'int',
'submission_time':'datetime64[ns]',
'maxvmem':'float',
'wait_time':'timedelta64[ns]'}))
I found this answer to a similar question but it seems to suggest I'm using the correct dtype format.
Update: Here's the same code with the suggested change from @hpaulj:
import pandas as pd
import numpy as np
sample_row = pd.DataFrame([['g1',
3912841,
'2018-09-29 16:03:49',
4.040196e+09,
pd.Timedelta('1 days 15:49:38')]],
columns=['group',
'job_number',
'submission_time',
'maxvmem',
'wait_time'])
sample_row = (sample_row.astype(dtype={'group':'str',
'job_number':'int',
'submission_time':'datetime64[ns]',
'maxvmem':'float',
'wait_time':'timedelta64[ns]'}))
To confirm that the dtypes are set correctly:
for i in sample_row.loc[0, sample_row.columns]:
print(type(i))
Output:
<class 'str'>
<class 'numpy.int32'>
<class 'pandas._libs.tslib.Timestamp'>
<class 'numpy.float64'>
<class 'pandas._libs.tslib.Timedelta'>