0

I'm working with 3706 time series loacated in the same file. But the time series time span is different for some of them.

Some of the time series start at 01/01 and end in 12/31 But some straight up start in 11/24 to 12/25 They are daily timeseries.

And the dataframe look somethings like this (I can't provided the real data sorry it's confidential)

Date Product Customer
01/01 1 Cust 1
01/01 5 Cust 1
01/02 1 Cust 1
01/02 1 Cust 2

I tried using the dart and use LGBM forecasting it but it seems to error due to the different time span.

Which approach should I used to make it on the same time span. If there's no good option is there any other model that can handle it?

I tried to add the missing date using this code but it just crashes my computer. I assumed due to it using too much memory.

unique_combinations = df[['Product', 'Customer']].drop_duplicates()

date_ranges = unique_combinations.groupby(['Product', 'Customer']).agg({'Date': ['min', 'max']})

start_date = pd.to_datetime('2023-01-01')
end_date = pd.to_datetime('2023-12-31')
date_range = pd.date_range(start_date, end_date, freq='D')
template_df = pd.DataFrame({'Date': date_range})

merged_dfs = []
for combination in unique_combinations.itertuples(index=False):
    product, customer = combination
    date_min, date_max = date_ranges.loc[(product, customer), ('Date', 'min')], date_ranges.loc[(product, customer), ('Date', 'max')]
    merged_df = pd.merge(template_df, df[(df['Product'] == product) & (df['Customer'] == customer)], on='Date', how='left')
    merged_df = merged_df[(merged_df['Date'] >= date_min) & (merged_df['Date'] <= date_max)]  # Optional: Trim the merged DataFrame to the available date range
    merged_dfs.append(merged_df)

final_df = pd.concat(merged_dfs)

final_df.fillna(0, inplace=True)  # Replace missing values with zeros or appropriate fill value

I expected the same Date span for all the times series

Thanks in advance.

0 Answers0