I am currently working on my own implementation of the Isolation Forest algorithm following this paper: https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf. I was wondering what the theoretical upper bound for the path length is. I am struggling since the implementation I am using obtains values slightly higher than this bound so I am wondering if I calculated correctly. Since this bound is a worst case and highly unlikely I wanted to make sure the bound is adequately calculated.
This was my approach:
import numpy as np
def path_length_unsuccessful_bst(n):
return 2 * (np.log(n-1) + 0.5772156649) - (2*(n-1)/n)
psi = 256 # Number of samples to be used to train the tree
max_depth = np.log2(psi) # The maximum depth of the tree
max_nodes_in_leaf_node = psi - max_depth # Since the worst case is that we remove only one point every level of the iTree
max_path_length = max_depth + path_length_unsuccessful_bst(psi - max_depth)
However I still get some path lengths above this threshold. Is there anything that I am missing here?