Converting a large datetime64[D]
Series
(i.e. 900k rows of a DataFrame
column) is taking too long. How can I speed it up?
import pandas as pd
df = pd.DataFrame(['2021-10-01']*900000, columns=['date']) # 0.025286900 seconds
df = df.assign(date=df['date'].astype('datetime64[D]')) # 0.105065900
# Why is converting from datetime to str so slow?
df.assign(date=df['date'].dt.strftime('%Y-%m-%d')) # 5.600835100 seconds.
txt = str(df) # 0.006202600
# Converting the entire DataFrame to a str is much faster
# than converting a column directly, despite a similar display format!
There is a related question, which asks how to convert quickly from str
to datetime
. But my bottleneck is (surprisingly) the inverse; converting from datetime[D]
to str
is far too slow.