I am having issues using Numpy histogram on a particular data set.
The issue is that I get a very slow response (several minutes) as well as very large memory usage. The memory behavior I noticed is a 12GB peak which then ramps down to ~750MB and then back up to the high GBs. This seems to repeat endlessly. Even if I let it run through. It takes multiple minutes and I get a Memory error at the end.
All this happens when passed a (very) small data set such as the one below (26 elements):
array(['2.400000024000011e-05', '2.4000000240000108e-05',
'2.400000024000011e-05', '2.400000024000012e-05',
'2.4000000240000105e-05', '2.4000000240000105e-05',
'2.400000024000009e-05', '2.400000024000012e-05',
'2.400000024000012e-05', '2.400002024000031e-05',
'2.4000000240000145e-05', '2.400000024000012e-05',
'2.400000024000012e-05', '2.4000000240000064e-05',
'2.400000024000012e-05', '2.400000024000012e-05',
'2.400000024000012e-05', '2.400000024000012e-05',
'2.400000024000012e-05', '2.400000024000012e-05',
'2.400000024000001e-05', '2.400000024000012e-05',
'2.4000020240000364e-05', '2.400000024000012e-05',
'2.400000024000012e-05', '2.400000024000012e-05'], dtype='float64')
I am assuming part of the slowdown could be due to reaching the physical memory cap and then being limited by swap time.
The histogram calculation is as follows:
histY, histX = np.histogram(vals, bins='auto')
Where '''vals''' is the example values in the Numpy array provided above
*Note the small min-max margin in the above case of 2.0000000353764813e-11
My quick guess; the histogram function is stuck doing some iterative optimization to find the best bin sizes vs bin count for this data set and is having issues with the small min-max margin.
The error I receive when it finally ends:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".....\lib\site-packages\numpy\lib\histograms.py", line 737, in histogram
n = np.zeros(n_equal_bins, ntype)
MemoryError
Could someone please explain what is really happening here and what can be done to circumvent the issue?