I'm using pandas to load data from Excel with the resulting DataFrame containing both strings and dates. The columns containing strings are of dtype "object" while the date-columns are of dtype "datetime64[ns]". At some point in my code I need to convert one column from datetime to string for writing back to Excel, but pandas will not let me do that if I try it the way that would be most obvious to me and seems to be the recommended way of doing it according to documentation: using .loc to get the column to be changed and assign them with the same column converted to strings.
I have found ways to circumvent the problem and get pandas to do what I need, but either this is a bug or I do not understand some underlying mechanic which could come back to bite me in the longer run, hence my question.
The code to reproduce this (occurs in both pandas 2.0.0 and 2.0.1, this might cause the problem) is as follows (in the actual DataFrame I am using there are many more than a single column):
import pandas as pd
not_yet_datetime_df = pd.DataFrame([["2023-01-06", "2023-01-06", "2023-01-06", "2023-01-06", "2023-01-06"]]).T
datetime_df = not_yet_datetime_df.astype("datetime64[ns]")
datetime_df.loc[:, 0] = datetime_df.loc[:, 0].dt.strftime("%d.%m.%Y")
datetime_df.loc[:, 0] = datetime_df.loc[:, 0].astype("object") # neither of these two will work for me
print(datetime_df.dtypes) # will return "datetime64[ns]" for this single column
There are multiple ways to circumvent this that work for me, including simply replacing line 5 with
datetime_df[0] = datetime_df.loc[:, 0].dt.strftime("%d.%m.%Y")
(omitting the .loc left of the equals sign) and I at least can get the column to being "object" dtype with datetime_df = datetime_df.astype({0:"object"})
, but I don't quite understand why especially the first solution works and what I misunderstood about .loc - or about datetimes in general.
I read a bit into the pandas 2.0.0 change on returning views vs. copies but to my (limited) understanding this should not be affected by any of the 2.0.0 changes.
Could anyone help me understand what's happening here under the hood? I like using .loc over assigning just with []-brackets and I feel like it's not as intuitive as I had hoped.