2

I have a table looks like this. (I have a bigger dataset but this explains the logic)

Date Continent Value1 Value2 Value3
03.05.2023 Europe 0,50 %99,90 105,23
03.05.2023 Asia 0,75 %87,51 112,25
04.05.2023 Europe 0,53 %99,92 109,42
04.05.2023 Asia 0,89 %79,52 132,35

I am using this table to detect anomalies on different values. But I only get "ds" as Date and "y" as Value and I write the results to different sheets (Like Value1_Results, Value2_Results...) in the same Excel using ExcelWriter.

I want to get detailed and add the other variable named Continent to the formula, so I can analyze not only date but continent+date by continent+date for example. So I need a little bit of help here.

df_question = pd.read_excel("question.xlsx", sheet_name="SHeet2", engine='openpyxl')

I firstly read the excel here

datas = []
for col in df_question.columns[2:]:
    datas.append(df_question[['Continent','Date', col]])   
print(datas[1])

And here, I added the new column "Continent"

names = []
for data in datas:
    names.append(data.columns[2])
    data.columns = ["Site","ds", "y"]
    data = pd.DataFrame(data=data)
    # print(data.head())
print(names)

Now, I am saving the value names for later.

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.float_format", lambda x: "%.3f" % x)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("display.width", 500)

def fit_predict_model(dataframe):
    
    # seasonality, holidays
    # it can integrate football matches to data.
    m = Prophet(yearly_seasonality = True, daily_seasonality = True)
    m.add_country_holidays(country_name="TR")
    
    m = m.fit(dataframe)
    
    forecast = m.predict(dataframe)
    forecast["fact"] = dataframe["y"].reset_index(drop = True)
    return forecast

preds = []
for i, data in enumerate(datas):
    preds.append(fit_predict_model(data))

Now I am traning the program and detecting anomalies.

pd.options.plotting.backend = "plotly"
for i, pred in enumerate(preds):
    fig = pred.plot(x='ds', y=["yhat_lower", "fact", "yhat_upper", "yhat"])
    fig.update_layout(title=names[i])
    fig.show()

Here, I am showing the plots but I could not figure out how to add the Continent variable.

def detect_anomalies(forecast):
    forecasted = forecast[["ds","trend", "yhat", "yhat_lower", "yhat_upper", "fact"]].copy()
    #forecast["fact"] = df["y"]

    forecasted["anomaly"] = 0
    forecasted.loc[forecasted["fact"] > forecasted["yhat_upper"], "anomaly"] = 1
    forecasted.loc[forecasted["fact"] < forecasted["yhat_lower"], "anomaly"] = -1
    
    #anomaly importances
    
    forecasted["importance"] = 0
    forecasted.loc[forecasted["anomaly"] ==1, "importance"] = \
        (forecasted["fact"] - forecasted["yhat_upper"])/forecast["fact"]
    forecasted.loc[forecasted["anomaly"] ==-1, "importance"] = \
        (forecasted["yhat_lower"] - forecasted["fact"])/forecast["fact"]
    return forecasted
preds1 = []
for i, data in enumerate(datas):
    preds1.append(detect_anomalies(preds[i]))

Here I am detecting the anomalies and saving them into preds1 df. But it only gets the Date information.

with pd.ExcelWriter(r'C:\Users\user\OneDrive\Belgeler\anomaly\question.xlsx') as writer:
    for i, data in enumerate(preds1):
        data.sample(32).to_excel(writer, sheet_name=names[i], index=False)

Lastly, I am writing these results to Excel. But it only writes a single DF for each value and does not include Continent information.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Arnve
  • 29
  • 5

0 Answers0