uproot: best way to load and replot a TH2 histogram from a .root file on a jupyter notebook

Question

I am very new to python and uproot. Previously, I have been using ROOT in a C++ environment. Following the uproot tutorial, I can read my TH2D graphs from a .root file

I want now to recreate and replot the existing graph through matplotlib or seaborn, but I don't get the structure of the imported TH2. myTH2D._members() outputs correctly:

['fName',
 'fTitle',
 'fLineColor',
 'fLineStyle',
 'fLineWidth',
 'fFillColor',
 'fFillStyle',
 'fMarkerColor',
 'fMarkerStyle',
 'fMarkerSize',
 'fNcells',
 'fXaxis',
 'fYaxis',
 'fZaxis',
 'fBarOffset',
 'fBarWidth',
 'fEntries',
 'fTsumw',
 'fTsumw2',
 'fTsumwx',
 'fTsumwx2',
 'fMaximum',
 'fMinimum',
 'fNormFactor',
 'fContour',
 'fSumw2',
 'fOption',
 'fFunctions',
 'fBufferSize',
 'fBuffer',
 'fBinStatErrOpt',
 'fScalefactor',
 'fTsumwy',
 'fTsumwy2',
 'fTsumwxy']

myTH2D.edges outputs the right axis, myTH2D.values outputs the right counts (confirmed with a rough plt.imshow(myTH2D.values). The problems start when I call myTH2D.pandas()

count   variance
tof1 [ns]   tof2 [ns]       
[-inf, 4500.0)  [-inf, 4500.0)  0.0 0.0
[4500.0, 4507.142857142857) 0.0 0.0
[4507.142857142857, 4514.285714285715)  0.0 0.0
[4514.285714285715, 4521.428571428572)  0.0 0.0
[4521.428571428572, 4528.571428571428)  0.0 0.0
... ... ... ...
[7500.0, inf)   [6971.428571428572, 6978.571428571429)  0.0 0.0
[6978.571428571429, 6985.714285714286)  0.0 0.0
[6985.714285714286, 6992.857142857143)  0.0 0.0
[6992.857142857143, 7000.0) 0.0 0.0
[7000.0, inf)   0.0 0.0
123904 rows × 2 columns

and the ntuple that is created with myTH2D.numpy() is nested in a way that I don't understand:

(array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]),
 [(array([4500.        , 4508.57142857, 4517.14285714, 4525.71428571,
          4534.28571429, 4542.85714286, 4551.42857143, 4560.        ,
          ...,
          7414.28571429, 7422.85714286, 7431.42857143, 7440.        ,
          7448.57142857, 7457.14285714, 7465.71428571, 7474.28571429,
          7482.85714286, 7491.42857143, 7500.        ]),
   array([4500.        , 4507.14285714, 4514.28571429, 4521.42857143,
          4528.57142857, 4535.71428571, 4542.85714286, 4550.        ,
          ...,
          6957.14285714, 6964.28571429, 6971.42857143, 6978.57142857,
          6985.71428571, 6992.85714286, 7000.        ]))])

Do you have any suggestion on how to handle these ntuple?

Thank you!

EDIT:

with the following syntax, I can almost achieve the right plot. It is flipped compared to the original:

plt.pcolormesh(myTH2D[1][0][0],myTH2D[1][0][1],myTH2D[0])

Nevertheless, my problem is still there: I'd like to have the data processed through pandas, having therefore the labels: now I don't know which is x- and which is y-axis. Any ideas?

I would reshape myTH2D.numpy(), but I don't get the structure itself: myTH2D.numpy()[0] are the z values (counts) in a 350x350 matrix. myTH2D.numpy()[1] seems to have both x and y axis in one coloumn with two rows, eahc of one is an array (?). — giammi56, Sep 04 '20 at 13:07
Reshape is the right way, but the conversion in numpy is the wrong strategy. Please consider the conversion in DataFrame using pandas() AND unstack the resulting object with a pivot table: https://stackoverflow.com/questions/63790713/uproot-processing-a-th2d-using-the-uproot-method-pandas — giammi56, Sep 08 '20 at 12:47

score 1 · Answer 1 · answered Dec 09 '20 at 00:45

uproot3 understands numpy.histogram. So you can do:

import uproot3 as uproot
import numpy as np

x = np.random.normal(size=10000)
y = np.random.normal(size=10000)

f = uproot.recreate('example.root', compression=uproot.ZLIB(4))
f["h"] = np.histogram2d(x, y, 80)
f.close()

You should now have a TH2F named h in example.root

score 0 · Accepted Answer · answered Sep 04 '20 at 12:07

0

From the arrays of edges and bin counts (myTH2D.numpy()), you could use any of these techniques to plot it in Matplotlib:

Python: Creating a 2D histogram from a numpy matrix

You mentioned Seaborn, but I'm less familiar with that. Surely it has similar functions.

On the bleeding edge, you could instead install uproot4 and hist>=2.0.0 (to get the hist prerelease), and then just

myTH2D.to_hist().plot()

The hist library aims to be a one-stop-shop for histogramming, and it's close to its first non-pre release. (The series starts at 2.0.0 because it took over the name of a no-longer-updated project. "hist" is to general of a name to lose!)

The Uproot 4 codebase is almost ready to replace the current Uproot; it needs documentation and file-writing capabilities. The interface is slightly different to address issues with Uproot 3's interface (e.g. strings vs bytestrings), so that's why this is being handled as a gradual transition with a temporarily different library name, rather than changing all at once. But if you're just starting out, you might want to start with the new library, so that you don't have to get used to a change in the near future (this fall).

answered Sep 04 '20 at 12:07

Jim Pivarski

5,568
2
35
47

Thank you for you answer! Before using one of those methods, which require x,y and z in arrays, I'd like to understand how the myTH2D.numpy() is structured, and how you would extract/call the three arrays that are needed to plot the histos! I will give a try to uproot4, but first I'd like to get the structure of the ntuple. – giammi56 Sep 04 '20 at 12:29
1

I forgot to mention the package "mplhep", which also maps NumPy-style histogram data into plots (without having to explicitly build a colormesh). I don't remember how underflow and overflow bins were handled in Uproot 3, but in Uproot 4, they're always included as the first and last bins (`0` and `-1`). In yesterday's commit to master, every function has been given a docstring, including histogram functions like `edges` and `values`, which should help to explain what all of these mean. – Jim Pivarski Sep 05 '20 at 13:24
"mphelp" is indeed the solution for the specific problem, but the conversion in numpy is the wrong strategy. Please consider the conversion in DataFrame using pandas() AND unstack the resulting object with a pivot table – giammi56 Sep 08 '20 at 12:46
You reminded me how important this function is and that we need to have it in Uproot 4: https://github.com/scikit-hep/uproot4/issues/91 – Jim Pivarski Sep 08 '20 at 13:29
Thank you for opening the issue. I wish I could participate, but I fell my knowledge is still too limited. – giammi56 Sep 08 '20 at 16:29
@Giammi can you clarify why using numpy is bad? I would assume that creating a pandas df will carry a performance burden – Andrzej Novák Oct 13 '20 at 09:50
I haven't tested the performance. I am just pointing out that according to my experience for the tyoe of complex large set of data I was handling in relatively intricated tree structures, I found more convenient to export the relevant information to a DataFrame to process them. – giammi56 Oct 14 '20 at 21:02

uproot: best way to load and replot a TH2 histogram from a .root file on a jupyter notebook

2 Answers2

Linked