I am trying to convert my flat ROOT ntuples into dataframes (via arrays). Currently, I am using root_numpy and would like to use uproot instead to avoid any ROOT dependencies.
I have a list of string names of the 20 variables to save (from a total of 104 variables in the tree) named vars_to_save.
In root_numpy I use:
rootFile = ROOT.TFile(f)
intree = rootFile.Get(tree_name)
arr = tree2array(intree,branches=vars_to_save)
df_root_numpy = pd.DataFrame(arr)
with this taking ~ 2 seconds ( for 491176 events and the 20 variables in the list of strings vars_to_save)
In uproot I have tried:
#attempt 1
tree = uproot.open(f)[tree_name]
df_uproot = tree.pandas.df(vars_to_save)
#attempt 2
tree = uproot.open(f)[tree_name]
df_uproot = tree.arrays(vars_to_save, outputtype=pd.DataFrame)
#attempt 3
tree = uproot.open(f)[tree_name]
arr = tree.arrays(vars_to_save)
df_uproot = pd.DataFrame(arr)[vars_to_save]
with each taking ~ 45 seconds ( around 20 times slower). In attempt 3 I notice that the tree.arrays() step is the slowest step taking around 40 seconds.
Is there a way in uproot to speed up this operation?