1

I am trying to load a Higgs Boson dataset from uproot. I am not comfortable with the use of uproot and the .root data type. I am using the following code which is a sort of official instructions to load the library. I have made a virtual environment and installed the necessary libraries

import oamap.source.root
import uproot

events = uproot.open("http://scikit-hep.org/uproot/examples/HZZ.root")["events"].oamap()

I am getting the following error while running the above sample code.

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    events = uproot.open("http://scikit-hep.org/uproot/examples/HZZ.root")["events"].oamap()
  File "/home/akash/DIANAhep/lib/python3.5/site-packages/oamap/source/root.py", line 187, in __call__
    generator = self.schema.generator()
  File "/home/akash/DIANAhep/lib/python3.5/site-packages/oamap/source/root.py", line 184, in schema
    return oamap.schema.List(recurse(self.tree), starts="", stops="")
  File "/home/akash/DIANAhep/lib/python3.5/site-packages/oamap/source/root.py", line 138, in recurse
    x = frominterp(name, branch, uproot.interp.auto.interpret(branch))
  File "/home/akash/DIANAhep/lib/python3.5/site-packages/oamap/source/root.py", line 100, in frominterp
    return oamap.schema.Primitive(interpretation.todtype, data=name)
  File "/home/akash/DIANAhep/lib/python3.5/site-packages/oamap/schema.py", line 346, in __init__
    self.data = data
  File "/home/akash/DIANAhep/lib/python3.5/site-packages/oamap/schema.py", line 418, in data
    raise TypeError("data must be None or an array name (string), not {0}".format(repr(value)))
TypeError: data must be None or an array name (string), not b'NJet'

I am novice in these sort of loading dataset.

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
Akash Kumar
  • 1,356
  • 1
  • 10
  • 28

1 Answers1

0

Python 2 and 3 differ in their treatment of byte strings versus unicode strings: Python 2 implicitly converts (weakly, dynamically typed), but Python 3 complains (strongly, dynamically typed).

Names that come from ROOT files are byte strings because ROOT provides no encodings. They're all just "char *". The appropriate Python type is byte string.

OAMap just doesn't want to deal with this— array names are strings, meaning anything in unicode. In Python 2, an encoding is implicitly assigned to make this true; Python 3 is stricter about how the encoding is assigned. Its connector to ROOT might be missing cases to handle unencoded byte strings.

Switch to Python 2 for an easy fix.

Akash Kumar
  • 1,356
  • 1
  • 10
  • 28