-2

I am looking for a python library/function which splits an integer to a set of integers(whole numbers) based on a discrete distribution

eg: 10 with distribution [0.2, 0.3 and 0.5] should result in [2, 3, 5]
    14 with same distribution should result in [3, 4, 7]
    similarly for,
1 => [0, 0, 1]
2 => [0, 1, 1]
3 => [1, 1, 1] or [0, 1, 2]

In any case the output list should sum upto the given input.

I can write a custom function to do this but doesn't want to miss out any obvious library. I've tried to find a function for this in numpy/scipy but couldn't find any.

Suba Selvandran
  • 304
  • 2
  • 16

1 Answers1

2

You could combine a couple of numpy functions but you would need to express your distribution a little differently:

dist    = [0.2, 0.3, 0.5]        # must add up to exactly 1.
cumDist = np.cumsum([0] + dist)  # express as a cumulative from zero

for N in range(1,15): 
    print(N,np.diff(np.rint(cumDist*N))) # apply distribution (rounded)

1 [0. 0. 1.]
2 [0. 1. 1.]
3 [1. 1. 1.]
4 [1. 1. 2.]
5 [1. 1. 3.]
6 [1. 2. 3.]
7 [1. 3. 3.]
8 [2. 2. 4.]
9 [2. 2. 5.]
10 [2. 3. 5.]
11 [2. 4. 5.]
12 [2. 4. 6.]
13 [3. 3. 7.]
14 [3. 4. 7.]

If you express your distribution as integers (e.g. percentages), you can get np.int numbers as output (without using np.rint):

dist    = [20, 30, 50]           # integer percentages
cumDist = np.cumsum([0] + dist)  # express as a cumulative 

for N in range(1,15):
    print(N,np.diff(cumDist*N//cumDist[-1]))

1 [0 0 1]
2 [0 1 1]
3 [0 1 2]
4 [0 2 2]
5 [1 1 3]
6 [1 2 3]
7 [1 2 4]
8 [1 3 4]
9 [1 3 5]
10 [2 3 5]
11 [2 3 6]
12 [2 4 6]
13 [2 4 7]
14 [2 5 7]

Note that this does not round the calculations so the distribution is slightly different. It will also work with non-integer distributions (e.g. 0.2, 0.3, 0.5) but then it wouldn't return np.int data types.

Alain T.
  • 40,517
  • 4
  • 31
  • 51
  • Thanks for the Answer Alain. I thought it is a simple use case and there should be a straight forward function for this. Goes very deep. – Suba Selvandran Sep 02 '21 at 07:17