I am trying to read a large set of data as lazyarrays doing the following:
import uproot
import numpy as np
file_path = "~/data.root"
data = uproot.lazyarrays(file_path, "E")
hits = data['hits']
>>> <ChunkedArray [176 125 318 ... 76 85 51] at 0x7fb8612a8390>
np.array(hits)
>>> array([176, 125, 318, ..., 76, 85, 51], dtype=int32)
So as you can see, we can read 'hits' data as a lazzyarray and as an array without issues. But, when I try the same steps for a different branch, I get a ValueError. Here is how I proceed:
data['hits.dom_id']
>>> ValueError: value too large
However, when I access 'hits.dom_id' using uproot.array() I get my data. Here is how I proceed:
data2 = uproot.open(file_path)['E']['Evt']['hits']
data2['hits.dom_id'].array()
>>> <JaggedArray [[806451572 806451572 806451572 ... 809544061 809544061 809544061] [806451572 806451572 806451572 ... 809524432 809526097 809544061] [806451572 806451572 806451572 ... 809544061 809544061 809544061] ... [806451572 806451572 806451572 ... 809006037 809524432 809544061] [806451572 806451572 806451572 ... 809503416 809503416 809544058] [806451572 806465101 806465101 ... 809544058 809544058 809544061]] at 0x7fb886cbbbd0>
I have notice, but maybe this is just a coincidence, that whenever my data is in a JaggesArray format, uproot.lazyarrays() raises the same ValueError.
I might be doing something wrong here, could you please help?
Note: I don't think it's a RAM issue. I tried playing with the cache size, by using a cache size bigger than my data set and uproot.lazyarrays() still raised the ValueError.
Thank you!