I'm working with 3706 time series loacated in the same file. But the time series time span is different for some of them.
Some of the time series start at 01/01 and end in 12/31 But some straight up start in 11/24 to 12/25 They are daily timeseries.
And the dataframe look somethings like this (I can't provided the real data sorry it's confidential)
Date | Product | Customer |
---|---|---|
01/01 | 1 | Cust 1 |
01/01 | 5 | Cust 1 |
01/02 | 1 | Cust 1 |
01/02 | 1 | Cust 2 |
I tried using the dart and use LGBM forecasting it but it seems to error due to the different time span.
Which approach should I used to make it on the same time span. If there's no good option is there any other model that can handle it?
I tried to add the missing date using this code but it just crashes my computer. I assumed due to it using too much memory.
unique_combinations = df[['Product', 'Customer']].drop_duplicates()
date_ranges = unique_combinations.groupby(['Product', 'Customer']).agg({'Date': ['min', 'max']})
start_date = pd.to_datetime('2023-01-01')
end_date = pd.to_datetime('2023-12-31')
date_range = pd.date_range(start_date, end_date, freq='D')
template_df = pd.DataFrame({'Date': date_range})
merged_dfs = []
for combination in unique_combinations.itertuples(index=False):
product, customer = combination
date_min, date_max = date_ranges.loc[(product, customer), ('Date', 'min')], date_ranges.loc[(product, customer), ('Date', 'max')]
merged_df = pd.merge(template_df, df[(df['Product'] == product) & (df['Customer'] == customer)], on='Date', how='left')
merged_df = merged_df[(merged_df['Date'] >= date_min) & (merged_df['Date'] <= date_max)] # Optional: Trim the merged DataFrame to the available date range
merged_dfs.append(merged_df)
final_df = pd.concat(merged_dfs)
final_df.fillna(0, inplace=True) # Replace missing values with zeros or appropriate fill value
I expected the same Date span for all the times series
Thanks in advance.