I wanted to program bucket sort for a normal distribution (instead of the uniform distributed generally assumed by bucket sort). When I looked into the math, I came up with the following:
def bucket_sort(numbers):
mean = sum(numbers) / len(numbers)
variance = sum((x - mean) ** 2 for x in numbers) / len(numbers)
std_deviation = math.sqrt(variance)
num_buckets = int( math.sqrt(len(numbers)) )
buckets = [[] for _ in range(num_buckets)]
for num in numbers:
index = int( ( (num - mean) / std_deviation) * num_buckets)
buckets[index].append(num)
for bucket in buckets:
bucket.sort()
sorted_numbers = [num for bucket in buckets for num in bucket]
return sorted_numbers
For simplicity I assume, that numbers
only consists of positive integers. Is this adaption of bucket sort correct? I was not able to find good literature for that. Additionally I assume, that this version can be speed up, by using a smaller sample to compute mean
and variance
. Is this assumption also correct?