0

I am trying to read 'hits' and 'trks' data as a MultiIndex DataFrame from the file in this issue #390.

I managed to get data about all events in a DataFrame, by opening the content of my tree as Jagged arrays. Now I would like 'hits' and 'trks' to be read in a MultiIndex DataFrame. But I am not sure I understand why tree.pandas.df("hits.") and tree.pandas.df("trks.") aren't working for me.

Issue 1 for hits:

Here is how I proceed:

tree = uproot.open(my_file)['E']
tree.pandas.df("hits.*")

This gives the an empty AssertionError.

AssertionError: 

But when I try for example:

tree.pandas.df("hits.trig")

I do get a MultiIndex DataFrame with one column containing data read from tree["hits.trig"].

Issue 2 for tracks:

Note: Issue 1 is also valid for 'trks'. However, I was able to access data from hits as a jagged array, while for 'trks' it is not possible for some specific cases. Here is how I proceed for these cases:

tree["trks.rec_stages"].interpretation

the output is: asjagged(asdtype('>i4'), 10)

then:

tree.array("trks.rec_stages")

I get the following error:

ValueError: could not broadcast input array from shape (15713) into shape (15711)

I always get the error above with ('trks.rec_stages', 'trks.error_matrix', 'trks.fitinfo') using tree.array() .

But when I try this:

lazy_rec_stages = tree.lazyarray("trks.rec_stages")

I get my data as the following:

<ChunkedArray [[1 3 5 ... 1 1 1] [1 3 5 ... 1 1 1] [1 3 5 ... 1 1 1] ... [1 3 5 ... 1 1 1] [1 3 5 ... 1 1 1] [1 3 5 ... 1 1 1]] at 0x7f4dabe12450>

Except that data in each array of lazy_rec_stages read with uproot doesn't seem to have conserved the "structure" of data from the original root file. To illustrate this, I will use the following example: if we look at an event, for each event we have an associated number of tracks, each track has reconstruction stages info stored in rec_stages and a likelihood stored in trks.lik:

event       trks         trks.rec_stages        trks.lik
 0           0            "1 2 3"                 10
             1            "4 5"                   20   
             2            "6 7 8 9"               30

So one would expect:

tree.lazyarray("trks.rec_stages")[0][0]
Output: "1 2 3 5 4" 
tree.lazyarray("trks.lik")[0][0] 
Output: 10

But I am not sure that it is the case, here is what I get:

tree.lazyarray("trks.rec_stages")[0][0]
Output: [1 2 3 4 5 6 7 8 9]

Which makes it difficult to associate which rec_stages corresponds to which trks. Could you please tell me what I am doing wrong here?

PS: I think I am using the latest version of uproot.

Thank you for your time and consideration.

zineb a
  • 53
  • 3
  • This looks like two (distinct) GitHub Issues. Stackoverflow is for "how do I use it?" questions. – Jim Pivarski Nov 07 '19 at 11:57
  • Thank you for your quick reply. I created both issues [#398](https://github.com/scikit-hep/uproot/issues/398) and [#397](https://github.com/scikit-hep/uproot/issues/397) – zineb a Nov 07 '19 at 13:44

0 Answers0