1

Is there an equivalent of TTree::AddFriend() with uproot ? I have 2 parallel trees in 2 different files which I'd need to read with uproot.iterate and using interpretations (setting the 'branches' option of uproot.iterate).

Maybe I can do that by manually obtaining several iterators from iterate() calls on the files, and then calling next() on each iterators... but maybe there's a simpler way akin to AddFriend ?

Thanks for any hint !

edit: I'm not sure I've been clear, so here's a bit more details. My question is not about usage of arrays, but about how to read them from different files. Here's a mockup of what I'm doing :

# I will fill this array and give it as input to my DNN
# it's very big so I will fill it in place

bigarray = ndarray( (2,numentries),...)

# get a handle on a tree, just to be able to build interpretations :
t0 = .. first tree in input_files
interpretations = dict(
    a=t0['a'].interpretation.toarray(bigarray[0]),
    b=t0['b'].interpretation.toarray(bigarray[1]),
    )
# iterate with :
uproot.iterate( input_files, treename,
                branches = interpretations )    

So what if a and b belong to 2 trees in 2 different files ?

pseyfert
  • 3,263
  • 3
  • 21
  • 47
rdrien
  • 65
  • 4
  • There's no reason to mix them in the same interpretation or try to read them in a single command. Try reading `a` from one file independently of `b` from another file and later merge the arrays to make an input for your DNN. How easily you can do that depends on whether they represent the same events in the same order. If they don't, you'll need to do a `JOIN` as I've described below. – Jim Pivarski Feb 25 '20 at 19:41
  • The reason would be convenience, assuming that events in the 2 trees are indeed aligned which is the case TTree::AddFriend is designed for. Anyway thanks again for the answers ! – rdrien Feb 26 '20 at 08:59

2 Answers2

1

In array-based programming, friends are implicit: you can JOIN any two columns after the fact—you don't have to declare them as friends ahead of time.

In the simplest case, if your arrays a and b have the same length and the same order, you can just use them together, like a + b. It doesn't matter whether a and b came from the same file or not. Even if I've if these is jagged (like jets.phi) and the other is not (like met.phi), you're still fine because the non-jagged array will be broadcasted to match the jagged one.

Note that awkward.Table and awkward.JaggedArray.zip can combine arrays into a single Table or jagged Table for bookkeeping.

If the two arrays are not in the same order, possibly because each writer was individually parallelized, then you'll need some column to act as the key associating rows of one array with different rows of the other. This is a classic database-style JOIN and although Uproot and Awkward don't provide routines for it, Pandas does. (Look up "merging, joining, and concatenating" in the Pandas documenting—there's a lot!) You can maintain an array's jaggedness in Pandas by preparing the column with the awkward.topandas function.

The following issue talks about a lot of these things, though the users in the issue below had to join sets of files, rather than just a single tree. (In principle, a process would have to look ahead to all the files to see which contain which keys: a distributed database problem.) Even if that's not your case, you might find more hints there to see how to get started.

https://github.com/scikit-hep/uproot/issues/314

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
  • Thanks so much for the answer Jim. I've edited my post because I'm not sure I've been clear enough. – rdrien Feb 25 '20 at 13:02
  • forgot to say : I'm investigating the link you pointed to. But I'm not using pandas, so not sure yet it's relevant to my case. – rdrien Feb 25 '20 at 13:08
0

This is how I have "friended" (befriended?) two TTree's in different files with uproot/awkward.

import awkward
import uproot

iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
    # join arrays
    for field in array2.fields:
        array1 = awkward.with_field(array1, getattr(array2, field), where=field)
    # array1 now has branch "a" and "b"
    print(array1.a)
    print(array1.b)

Alternatively, if it is acceptable to "name" the trees,

import awkward
import uproot

iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
    # join arrays
    zippedArray = awkward.zip({"tree1": array1, "tree2": array2})
    # zippedArray. now has branch "tree1.a" and "tree2.b"
    print(zippedArray.tree1.a)
    print(zippedArray.tree2.b)

Of course you can use array1 and array2 together without merging them like this. But if you have already written code that expects only 1 Array this can be useful.

David Hadley
  • 184
  • 2
  • 8