Time complexity of pytables File.get_node() operation

Question

what is the time complexity of the pytables file operation get_node?

Let's say I query

mynode = myfile.get_node(where='group0/group1/..../groupN', name ='mynode')

How does this operation scale with N the number of parent groups of mynode ? Linearly, i.e. O(N), or worse O(N*d) where d is the average branching factor of my hdf5 node tree, or very fast O(1) because pytables internally keeps some sort of dictionary of all pathways?

Thanks a lot!

The cache I think you are refereeing to acts only on File objects, not nodes in a file. — Anthony Scopatz, Sep 12 '13 at 00:20

score 1 · Accepted Answer · answered Sep 12 '13 at 00:20

HDF5 implements nodes as a B-tree, so get_node() has a time complexity of O(log N) [1]. PyTables does not do any preloading of these paths in a dictionary to make this O(1). However, once a node has been loaded it is tagged as 'alive' and goes into an alive_nodes dictionary. Thus subsequent access is O(1) as long as the node remains in memory. So this is sort of a lazy O(1) operatin where you pay the O(log N) cost upfront once.

http://en.wikipedia.org/wiki/B-tree

Time complexity of pytables File.get_node() operation

1 Answers1