2

I want to quantize a series of numbers which have a maximum and minimum value of X and Y respectively into arbitrary number of bins. For instance, if the maximum value of my array is 65535 and the minimum is 0 (do not assume these are all integers) and I want to quantize the values into 2 bins, all values more than floor(65535/2) would become 65535 and the rest become 0. Similar story repeats if I want to quantize the array from any number between 1 to 65535. I wonder, is there an efficient and easy way to do this? If not, how can I do this efficiently for number of bins being powers of 2? Although a pseudocode would be fine but Python + Numpy is preferred.

Amir
  • 10,600
  • 9
  • 48
  • 75
  • Yes sure, sorry my bad. Updated my post – Amir May 09 '18 at 08:26
  • 1
    It seems like the obvious thing to do, which you mention, is floor(x/(range/nbins)). Why is that not a desirable solution? It seems easy and, as for efficiency, I don't see how you would improve on it. – Robert Dodier May 11 '18 at 19:12

1 Answers1

3

It's not the most elegant solution, but:

MIN_VALUE = 0
MAX_VALUE = 65535
NO_BINS = 2   

# Create random dataset from [0,65535] interval
numbers = np.random.randint(0,65535+1,100)

# Create bin edges
bins = np.arange(0,65535, (MAX_VALUE-MIN_VALUE)/NO_BINS)

# Get bin values
_, bin_val = np.histogram(numbers, NO_BINS-1, range=(MIN_VALUE, MAX_VALUE))

# Change the values to the bin value
for iter_bin in range(1,NO_BINS+1):
    numbers[np.where(digits == iter_bin)] = bin_val[iter_bin-1]

UPDATE

Does the same job:

import pandas as pd
import numpy as np

# or bin_labels = [i*((MAX_VALUE - MIN_VALUE) / (NO_BINS-1)) for i in range(NO_BINS)]
_, bin_labels = np.histogram(numbers, NO_BINS-1, range=(MIN_VALUE, MAX_VALUE))

pd.cut(numbers, NO_BINS, right=False, labels=bin_labels)
Aechlys
  • 1,286
  • 7
  • 16
  • I don't have time to test this now will test this soon. Before I do that, I just want to make sure: does your code work with arbitrary number of bits? What if I want 63 bins? What if the values of the main array are float? – Amir May 09 '18 at 19:07
  • You can define the number of bins using the `NO_BINS` variable. It works with floats, as well. Obviously I haven't written extensive tests. – Aechlys May 09 '18 at 19:16