My pyROOT analysis code is using huge amounts of memory. I have reduced the problem to the example code below:
from ROOT import TChain, TH1D
# Load file, chain
chain = TChain("someChain")
inFile = "someFile.root"
chain.Add(inFile)
nentries = chain.GetEntries()
# Declare histograms
h_nTracks = TH1D("h_nTracks", "h_nTracks", 16, -0.5, 15.5)
h_E = TH1D("h_E","h_E",100,-0.1,6.0)
h_p = TH1D("h_p", "h_p", 100, -0.1, 6.0)
h_ECLEnergy = TH1D("h_ECLEnergy","h_ECLEnergy",100,-0.1,14.0)
# Loop over entries
for jentry in range(nentries):
# Load entry
entry = chain.GetEntry(jentry)
# Define variables
cands = chain.__ncandidates__
nTracks = chain.nTracks
E = chain.useCMSFrame__boE__bc
p = chain.useCMSFrame__bop__bc
ECLEnergy = chain.useCMSFrame__boECLEnergy__bc
# Fill histos
h_nTracks.Fill(nTracks)
h_ECLEnergy.Fill(ECLEnergy)
for cand in range(cands):
h_E.Fill(E[cand])
h_p.Fill(p[cand])
where someFile.root is a root file with 700,000 entries and multiple particle candidates per entry.
When I run this script it uses ~600 MB of memory. If I remove the line
h_p.Fill(p[cand])
it uses ~400 MB.
If I also remove the line
h_E.Fill(E[cand])
it uses ~150 MB.
If I also remove the lines
h_nTracks.Fill(nTracks)
h_ECLEnergy.Fill(ECLEnergy)
there is no further reduction in memory usage.
It seems that for every extra histogram that I fill of the form
h_variable.Fill(variable[cand])
(i.e. histograms that are filled once per candidate per entry, as opposed to histograms that are just filled once per entry) I use an extra ~200 MB of memory. This becomes a serious problem when I have 10 or more histograms because I am using GBs of memory and I am exceeding the limits of my computing system. Does anybody have a solution?
Update: I think this is a python3 problem.
If I take the script in my original post (above) and run it using python2 the memory usage is ~200 MB, compared to ~600 MB with python3. Even if I try to replicate Problem 2 by using the long variable names, the job still only uses ~200 MB of memory with python2, compared to ~1.3 GB with python3.
During my Googling I came across a few other accounts of people encountering memory leaks when using pyROOT with python3. It seems this is still an issue as of Python 3.6.2 and ROOT 6.08/06, and that for the moment you must use python2 if you want to use pyROOT.
So, using python2 appears to be my "solution" for now, but it's not ideal. If anybody has any further information or suggestions I'd be grateful to hear from you!