0

I have built an isolation forest to detect anomalies for a csv file that I have, and I wanted to see how I can change the format of the data. Right now, the anomaly data is being outputted as a pandas dataframe, but I would like to alter it to be a json file, in the following format:

{seconds: #seconds for that row, size2: size2, pages: #pages for that row}

I have attached come of the code and a sample of the data, thank you so much!

model.fit(df[['label']])
df['anomaly']=model.fit_predict(df[['size2','size3','size4']])
#df['anomaly']= model.predict(df[['pages']])
print(model.predict(X_test))
anomaly = df.loc[df['anomaly']==-1]
anomaly_index = list(anomaly.index)
print(anomaly)

enter image description here

The output data looks something like this:

Unnamed:  seconds:    size2: ... size4: pages:  anomaly:
1          40            32       654     1       -1

1 Answers1

1

I have figured out a way to do this; I made multiple dictionaries, one mapping the index of the row to that timestamp, and one mapping the index of the row to the label. I was then able to keep track of which indexes were in the output data, and access all the information from those dictionaries.