I would like to convert my timestamp columns into a regressor for the ML model. I need it to be a spline - one continuous variable (one column in the design matrix X), as opposed to a one-hot codded discretized intervals.
In my original data, I have timestamp only as a string at my disposal, and using a few pandas tricks I am able to get what I need.
However, I have a problem when creating the time differences, where I would like to calculate the time distance from the minimal column's value to the value of the current row.
I am proving simple example down below:
import pandas as pd
# Create dataset
df = pd.DataFrame({"Time": ["071323", "081326", "101405", "111545", "124822", "131112"]})
df
df = df.assign(timestamp=lambda d: pd.to_datetime(d["Time"], format="%H%M%S").dt.time)
df.assign(time_trend_from_min = lambda d: d['timestamp'] - d['timestamp'].min())
# The values should be 1 hour, 0 minutes and 3 seconds etc.
The second problem I have is that when I only convert it using pd.to_datetime(d["Time"], format="%H%M%S").dt.time
is is stored as a object
type and thus e.g. patsy would dummy codded every single unique values. I need to create 1D time series of differences, where on the X-axis we have index and y-axis we have the time distance from minimal values.
I am sure it is literally 2 lines of code needed however I am not able to crack it on my own