0

I have the following dataframe :

Year_Month Country  Type   Data
 2019_01    France  IT     20
 2019_02    France  IT     30
 2019_03    France  IT     40
 2019_01    France  AT     10
 2019_02    France  AT     15
 2019_03    France  AT     20

I want to forecast for Year_Month "2019_04" separately for the combination France & IT & France & AT.

So, for example I should get results as follows:

Forecasts for (France,IT):

Year_Month Country  Type   Data
 2019_04    France  IT     50

Forecasts for (France,AT):

Year_Month Country  Type   Data
 2019_04    France  AT     25

How should the loop be designed so that the function that has the model can run for each combination at a time & saves the output?

ibarrond
  • 6,617
  • 4
  • 26
  • 45
user6016731
  • 382
  • 5
  • 18
  • you should simply start by defining your train set according to the country you are making the prediction on. So `train_it = df[df['Type'] == 'IT']` and `train_at = df[df['Type'] == 'AT']`. – Celius Stingher Jan 21 '20 at 14:41
  • We are just talking about 2 combinations here what about when I have 10 ? How can we do it dynamically ? – user6016731 Jan 21 '20 at 14:43

2 Answers2

1

Although there are many questions left in your problem (which model do you want to use to predict? How dar in the future do you want to predict? ...), you could start by using sklearn.linear_model from scikit-learn and compute a forecast for each type:

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

# Generate data from the example
df = pd.DataFrame({
 'Year_Month': {0: '2019_01',1: '2019_02',2: '2019_03',3: '2019_01',4: '2019_02',5: '2019_03'},
 'Country': {   0: 'France', 1: 'France', 2: 'France', 3: 'France', 4: 'France', 5: 'France'},
 'Type': {0: 'IT', 1: 'IT', 2: 'IT', 3: 'AT', 4: 'AT', 5: 'AT'},
 'Data': {0: 20, 1: 30, 2: 40, 3: 10, 4: 15, 5: 20}})

# Generate our empty regressor to fit the trend.
regressor = LinearRegression()

result = {}
# loop on every type
for t in df['Type'].unique():
    # slice
    df_slice = df[df['Type'] == t]

    # train the regressor
    regressor.fit(X=df_slice['Year_Month'].to_numpy().reshape(-1, 1), y=df_slice['Data'])

    # predict new values
    result[t] = {'predicted_value': regressor.predict(np.array([201904]).reshape(-1, 1))}

# build dataframe with all your results
final_df = pd.DataFrame(result)

#                      IT      AT
# predicted_value  [50.0]  [25.0]
ibarrond
  • 6,617
  • 4
  • 26
  • 45
  • Reusing part of the code in https://stackoverflow.com/questions/42123786/regress-by-group-in-pandas-dataframe-and-add-columns-with-forecast-values-and-be – ibarrond Jan 21 '20 at 16:02
  • Thanks ! I am trying to run hyperopt . I have explained everything here https://stackoverflow.com/questions/59827427/hyperopt-on-multiple-subsets-of-a-dataframe – user6016731 Jan 21 '20 at 16:09
  • why two separate questions? – ibarrond Jan 21 '20 at 16:10
  • Anyhow, you can reuse this logic and replace with your forecasting algorithms – ibarrond Jan 21 '20 at 16:11
0

Thanks ! What worked for me is comboList=list(zip(Map['country'],Map['type']))

comboList

for i,combo in enumerate(comboList): print(combo) subset=data[(data['country']==combo[0]) & (data['type']==combo[1])] subset=subset[["Data"]]

x_train_ts, y_train_ts, x_test_ts, y_test_ts = data(subset,10, 1)    


trials = Trials()
best = fmin(create_model_hypopt,
space=search_space,
algo=tpe.suggest,
max_evals=1,
trials=trials)

loss=trials.losses()
loss.append(loss)
user6016731
  • 382
  • 5
  • 18