How to reconstruct the raw data from a histogram?

Question

I need to recover the "raw data" from a timing histogram provided by a timing counter as a .csv file.

I've got the code below but since the actual data has several thousands of counts in each bin, a for loop is taking a very long time, so I was wondering if there was a better way.

import numpy as np

# Example histogram with 1 second bins
hist = np.array([[1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [0, 17, 3, 34, 35, 100, 101, 107, 12, 1]])

# Array for bins and counts
time_bins = hist[0]
counts = hist[1]

# Empty data to append
data = np.empty(0)

for i in range(np.size(counts)):
    for j in range(counts[i]):
        data = np.append(data, [time_bins[i]])

I get that the resolution of the raw data will be the smallest time bin but that is fine for my purposes. In the end, this is to be able to produce another histogram with logarithmic bins, which I am able to do with the raw data.

EDIT

The code I'm using to load the CSV is

x = np.loadtxt(fname, delimiter=',', skiprows=1).T 
a = x[0] 
b = x[1] 

data = np.empty(0) 
for i in range(np.size(b)): 
    for j in range(np.int(b[i])): 
        data = np.append(data, [a[i]])

Can you show us what your CSV looks like and roughly how large the file is? And post the code you're using that involves the `np.loadtxt()`? — m13op22, Jul 30 '19 at 18:44
I ask this because I think the issue is with how you're loading the data, which will influence how you reconstruct the histogram. — m13op22, Jul 30 '19 at 19:00
`x = np.loadtxt(fname, delimiter=',', skiprows=1).T` `a = x[0]` `b = x[1]` `data = np.empty(0)` `for i in range(np.size(b)):` `for j in range(np.int(b[i])):` `data = np.append(data, [a[i]])` — MrGodlikePro, Jul 30 '19 at 19:15

score 0 · Answer 1 · answered Jul 30 '19 at 18:33

0

Especially if there's a lot of data, copying the array each iteration (which is what append does -- numpy arrays can't be resized) will be costly. Try allocating first (i.e. data = np.zeros(np.size(counts))) and then just assigning to it.

I'm also not sure what your innermost for loop is doing, since each iteration appends the same thing?

answered Jul 30 '19 at 18:33

JoshuaF

1,124
2
9
23

The innermost loop is there to add the right amount of each time interval. For example, a time interval of 4 seconds happened 34 times, so it needs to be there 34 times. – MrGodlikePro Jul 30 '19 at 18:47

score 0 · Accepted Answer · 2019-07-30T19:25:27.357

0

You can do this with a list comprehension and the numpy concatenation:

import numpy as np
hist = np.array([[1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [0, 17, 3, 34, 35, 100, 101, 107, 12, 1]])
new_array = np.concatenate([[hist[0][i]]*int(hist[1][i]) for i in range(len(hist[0]))])

edited Jul 30 '19 at 19:25

answered Jul 30 '19 at 18:46

This seems like this could work, but I'm having trouble since my time intervals are floats. I've edited the code I've posted. I'm working on seeing what i can do with this. – MrGodlikePro Jul 30 '19 at 19:13
I've updated my answer. If you make sure to cast the counts as `int` it will work. – Jul 30 '19 at 19:26

How to reconstruct the raw data from a histogram?

EDIT

2 Answers2