0

what is the time complexity of the pytables file operation get_node?

Let's say I query

mynode = myfile.get_node(where='group0/group1/..../groupN', name ='mynode')

How does this operation scale with N the number of parent groups of mynode ? Linearly, i.e. O(N), or worse O(N*d) where d is the average branching factor of my hdf5 node tree, or very fast O(1) because pytables internally keeps some sort of dictionary of all pathways?

Thanks a lot!

SmCaterpillar
  • 6,683
  • 7
  • 42
  • 70

1 Answers1

1

HDF5 implements nodes as a B-tree, so get_node() has a time complexity of O(log N) [1]. PyTables does not do any preloading of these paths in a dictionary to make this O(1). However, once a node has been loaded it is tagged as 'alive' and goes into an alive_nodes dictionary. Thus subsequent access is O(1) as long as the node remains in memory. So this is sort of a lazy O(1) operatin where you pay the O(log N) cost upfront once.

  1. http://en.wikipedia.org/wiki/B-tree
Anthony Scopatz
  • 3,265
  • 2
  • 15
  • 14