I have a table looks like this. (I have a bigger dataset but this explains the logic)
Date | Continent | Value1 | Value2 | Value3 |
---|---|---|---|---|
03.05.2023 | Europe | 0,50 | %99,90 | 105,23 |
03.05.2023 | Asia | 0,75 | %87,51 | 112,25 |
04.05.2023 | Europe | 0,53 | %99,92 | 109,42 |
04.05.2023 | Asia | 0,89 | %79,52 | 132,35 |
I am using this table to detect anomalies on different values. But I only get "ds" as Date and "y" as Value and I write the results to different sheets (Like Value1_Results, Value2_Results...) in the same Excel using ExcelWriter.
I want to get detailed and add the other variable named Continent to the formula, so I can analyze not only date but continent+date by continent+date for example. So I need a little bit of help here.
df_question = pd.read_excel("question.xlsx", sheet_name="SHeet2", engine='openpyxl')
I firstly read the excel here
datas = []
for col in df_question.columns[2:]:
datas.append(df_question[['Continent','Date', col]])
print(datas[1])
And here, I added the new column "Continent"
names = []
for data in datas:
names.append(data.columns[2])
data.columns = ["Site","ds", "y"]
data = pd.DataFrame(data=data)
# print(data.head())
print(names)
Now, I am saving the value names for later.
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.float_format", lambda x: "%.3f" % x)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("display.width", 500)
def fit_predict_model(dataframe):
# seasonality, holidays
# it can integrate football matches to data.
m = Prophet(yearly_seasonality = True, daily_seasonality = True)
m.add_country_holidays(country_name="TR")
m = m.fit(dataframe)
forecast = m.predict(dataframe)
forecast["fact"] = dataframe["y"].reset_index(drop = True)
return forecast
preds = []
for i, data in enumerate(datas):
preds.append(fit_predict_model(data))
Now I am traning the program and detecting anomalies.
pd.options.plotting.backend = "plotly"
for i, pred in enumerate(preds):
fig = pred.plot(x='ds', y=["yhat_lower", "fact", "yhat_upper", "yhat"])
fig.update_layout(title=names[i])
fig.show()
Here, I am showing the plots but I could not figure out how to add the Continent
variable.
def detect_anomalies(forecast):
forecasted = forecast[["ds","trend", "yhat", "yhat_lower", "yhat_upper", "fact"]].copy()
#forecast["fact"] = df["y"]
forecasted["anomaly"] = 0
forecasted.loc[forecasted["fact"] > forecasted["yhat_upper"], "anomaly"] = 1
forecasted.loc[forecasted["fact"] < forecasted["yhat_lower"], "anomaly"] = -1
#anomaly importances
forecasted["importance"] = 0
forecasted.loc[forecasted["anomaly"] ==1, "importance"] = \
(forecasted["fact"] - forecasted["yhat_upper"])/forecast["fact"]
forecasted.loc[forecasted["anomaly"] ==-1, "importance"] = \
(forecasted["yhat_lower"] - forecasted["fact"])/forecast["fact"]
return forecasted
preds1 = []
for i, data in enumerate(datas):
preds1.append(detect_anomalies(preds[i]))
Here I am detecting the anomalies and saving them into preds1
df. But it only gets the Date
information.
with pd.ExcelWriter(r'C:\Users\user\OneDrive\Belgeler\anomaly\question.xlsx') as writer:
for i, data in enumerate(preds1):
data.sample(32).to_excel(writer, sheet_name=names[i], index=False)
Lastly, I am writing these results to Excel. But it only writes a single DF for each value and does not include Continent
information.