0

I am using root_numpy / uproot to work with a ROOT file (containing about 8 million events). I understand that uproot is only fast provided large Tbasket sizes, however, I am looking at a branch where the TBasket size is about 50kB, and to read the branch, uproot takes 70 sec, while root_numpy only takes 45 sec, which is puzzling since I would expect exactly the reverse. Does anyone have any suggestions as to what is going wrong here, and how I can fix it?

Here is my uproot code:

ttree = uproot.open(file)["TTree"]
array = ttree.array("Branch")

And here is my root_numpy code:

tfile = ROOT.TFile(file,"OLD")
ttree = tfile.Get("TTree")
array = root_numpy.tree2array(ttree,branches="Branch")
rak
  • 1
  • Thanks for including basket size—that rules out one potential issue. The next thing I'm wondering about is data type: if the branch contains complex objects, they have to be deserialized in Python code for Uproot, C++ for ROOT. But then, I don't think root_numpy can expose complex objects in Python, and if it does so through PyROOT, that's not fast. Print `ttree["Branch"].interpretation`. The third thing I'm wondering about is if the `file` is local or remote. Uproot3 has inefficient remote file handling. – Jim Pivarski Jul 02 '20 at 10:35
  • There's a fair chance that what you're trying to do is already implemented in Uproot4. You could try `pip install uproot4` and accessing the branch data with the same syntax. You can also look at its `interpretation` and C++ `typename` (the latter is new). Additionally, toggling between `library="ak"` (Awkward1, also available in pip) and `library="np"` switches between output modes; the latter is more general but potentially slower. (For complex types, `library="ak"` is a placeholder for what will become C++ code, but it's Python for now.) It might not be faster, but it would give more info. – Jim Pivarski Jul 02 '20 at 10:41
  • Thanks for the suggestions, Jim! I actually tried it with uproot4 now and upon first glance it seems that things are faster than root_numpy, but I still need to make sure. Most of the branches are are type std::vector or , so i believe they are read as jagged arrays. One question about uproot4: it does not seem to let me use the TTree.array() method; I cannot understand why this is happening. – rak Jul 03 '20 at 00:43
  • Also, I am not able to process some branches with uproot4, instead, I get an ```IndexError```, and I do not know what to do about that either. Those branches are of type vector> . Is this implemented in uproot4? – rak Jul 03 '20 at 00:53
  • vector> is definitely implemented, and the error you'd get would be DeserializationError, so that's not the problem. I'd need more information to diagnose it. Good news, though, is that this explains the performance issue: there's a big difference in ROOT between vector and vector>. In Uproot 4, I've added a hook to replace it with a fast implementation, but that hasn't happened yet. Put the full script and stack trace in a GitHub issue and I'll figure out why it's not working, though vector> won't be fast until the implementation is replaced. – Jim Pivarski Jul 04 '20 at 03:33

0 Answers0