0

I am currently working on my own implementation of the Isolation Forest algorithm following this paper: https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf. I was wondering what the theoretical upper bound for the path length is. I am struggling since the implementation I am using obtains values slightly higher than this bound so I am wondering if I calculated correctly. Since this bound is a worst case and highly unlikely I wanted to make sure the bound is adequately calculated.

This was my approach:

import numpy as np

def path_length_unsuccessful_bst(n):
    return 2 * (np.log(n-1) + 0.5772156649) - (2*(n-1)/n)

psi = 256 # Number of samples to be used to train the tree
max_depth = np.log2(psi) # The maximum depth of the tree
max_nodes_in_leaf_node = psi - max_depth # Since the worst case is that we remove only one point every level of the iTree
max_path_length = max_depth + path_length_unsuccessful_bst(psi - max_depth)

However I still get some path lengths above this threshold. Is there anything that I am missing here?

  • 1
    I arrived at the same formula as you did (assuming `psi = 256`). Note that, if you follow the algorithm of the paper, you have to make sure you select your split point *p* in *(min, max]*, otherwise you may have empty external nodes and a max path length larger than your calculation. – nonDucor Jun 25 '23 at 21:46

0 Answers0