I have a python polars dataframe that is quite large where Pandas runs into memory errors. I want to use python polars but am running into an issue of taking a integer representation of date to make two new columns: PeriodDate, and LagDate. I can do this on a sample in Pandas using the following:
df['PeriodDate'] = pd.to_datetime(df['IntegerDate'],format='%Y%m')
df['LaggedDate'] = df['PeriodDate'] - pd.DateOffset(months=1)
I have tried to do the following:
df.with_columns(
pl.col('IntegerDate').str.strptime(pl.Datetime,"%Y%m")
)
SchemaError: Series of dtype: Int64 != Utf8.
For reference the 'IntegerDate' column is of the format: 202005, 202006, ...etc
I haven't been able to find good examples of how to do this in polars so any help would be greatly appreciated.
Thanks!