53

So let's say we have a code block that we want to execute 70% of times and another one 30% of times.

if(Math.random() < 0.7)
    70percentmethod();
else
    30percentmethod();

Simple enough. But what if we want it to be easily expandable to say, 30%/60%/10% etc.? Here it would require adding and changing all the if statements on change which isn't exactly great to use, slow and mistake inducing.

So far I've found large switches to be decently useful for this use case, for example:

switch(rand(0, 10)){
    case 0:
    case 1:
    case 2:
    case 3:
    case 4:
    case 5:
    case 6:
    case 7:70percentmethod();break;
    case 8:
    case 9:
    case 10:30percentmethod();break;
}

Which can be very easily changed to:

switch(rand(0, 10)){
    case 0:10percentmethod();break;
    case 1:
    case 2:
    case 3:
    case 4:
    case 5:
    case 6:
    case 7:60percentmethod();break;
    case 8:
    case 9:
    case 10:30percentmethod();break;
}

But these have their drawbacks as well, being cumbersome and split onto a predetermined amount of divisions.

Something ideal would be based on a "frequency number" system I guess, like so:

(1,a),(1,b),(2,c) -> 25% a, 25% b, 50% c

then if you added another one:

(1,a),(1,b),(2,c),(6,d) -> 10% a, 10% b, 20% c, 60% d

So simply adding up the numbers, making the sum equal 100% and then split that.

I suppose it wouldn't be that much trouble to make a handler for it with a customized hashmap or something, but I'm wondering if there's some established way/pattern or lambda for it before I go all spaghetti on this.

Stevoisiak
  • 23,794
  • 27
  • 122
  • 225
Moff Kalast
  • 1,024
  • 1
  • 12
  • 22
  • Not sure you can do this with random numbers, it could be 100% of the time below or above 0.7 i guess. – Viezevingertjes Aug 23 '17 at 10:09
  • @Viezevingertjes that would depend on whether you want to guarantee that in 10 runs, both are ran exactly 7 and 3 times, or just want them to have that probability of being run. – JAD Aug 23 '17 at 13:29
  • 3
    Note that `rand(0,10)` give 11 possible values, and your "60%" is really 70%, making the total 110%. – CJ Dennis Aug 24 '17 at 03:05
  • Next question: how do we code the `60percent100percentMethod()`? – Cliff AB Aug 24 '17 at 03:41
  • This question is a duplicate of so many other questions... People should really search before posting! – Olivier Grégoire Aug 24 '17 at 11:59
  • @OlivierGrégoire link just one... – Mischa Sep 20 '17 at 10:48
  • @MischaBehrend [1](https://stackoverflow.com/q/6409652/180719), [2](https://stackoverflow.com/q/6737283/180719), [3](https://stackoverflow.com/q/9330394/180719), [4](https://stackoverflow.com/q/28660401/180719). Oh sorry... You wanted "just one". – Olivier Grégoire Sep 20 '17 at 12:11
  • @OlivierGrégoire 4 are also fine... :) But for the next time: Please link at least one dupe example, when you make the claim that a question is a dupe of another. – Mischa Sep 20 '17 at 12:36
  • Possible duplicate of [Random weighted selection in Java](https://stackoverflow.com/questions/6409652/random-weighted-selection-in-java) – izstas Sep 20 '17 at 19:19

7 Answers7

28

EDIT: See edit at end for more elegant solution. I'll leave this in though.

You can use a NavigableMap to store these methods mapped to their percentages.

NavigableMap<Double, Runnable> runnables = new TreeMap<>();

runnables.put(0.3, this::30PercentMethod);
runnables.put(1.0, this::70PercentMethod);

public static void runRandomly(Map<Double, Runnable> runnables) {
    double percentage = Math.random();
    for (Map.Entry<Double, Runnable> entry : runnables){
        if (entry.getKey() < percentage) {
            entry.getValue().run();
            return; // make sure you only call one method
        }
    }
    throw new RuntimeException("map not filled properly for " + percentage);
}

// or, because I'm still practicing streams by using them for everything
public static void runRandomly(Map<Double, Runnable> runnables) {
    double percentage = Math.random();
    runnables.entrySet().stream()
        .filter(e -> e.getKey() < percentage)
        .findFirst().orElseThrow(() -> 
                new RuntimeException("map not filled properly for " + percentage))
        .run();
}

The NavigableMap is sorted (e.g. HashMap gives no guarantees of the entries) by keys, so you get the entries ordered by their percentages. This is relevant because if you have two items (3,r1),(7,r2), they result in the following entries: r1 = 0.3 and r2 = 1.0 and they need to be evaluated in this order (e.g. if they are evaluated in the reverse order the result would always be r2).

As for the splitting, it should go something like this: With a Tuple class like this

static class Pair<X, Y>
{
    public Pair(X f, Y s)
    {
        first = f;
        second = s;
    }

    public final X first;
    public final Y second;
}

You can create a map like this

// the parameter contains the (1,m1), (1,m2), (3,m3) pairs
private static Map<Double,Runnable> splitToPercentageMap(Collection<Pair<Integer,Runnable>> runnables)
{

    // this adds all Runnables to lists of same int value,
    // overall those lists are sorted by that int (so least probable first)
    double total = 0;
    Map<Integer,List<Runnable>> byNumber = new TreeMap<>();
    for (Pair<Integer,Runnable> e : runnables)
    {
        total += e.first;
        List<Runnable> list = byNumber.getOrDefault(e.first, new ArrayList<>());
        list.add(e.second);
        byNumber.put(e.first, list);
    }

    Map<Double,Runnable> targetList = new TreeMap<>();
    double current = 0;
    for (Map.Entry<Integer,List<Runnable>> e : byNumber.entrySet())
    {
        for (Runnable r : e.getValue())
        {
            double percentage = (double) e.getKey() / total;
            current += percentage;
            targetList.put(current, r);
        }
    }

    return targetList;
}

And all of this added to a class

class RandomRunner {
    private List<Integer, Runnable> runnables = new ArrayList<>();
    public void add(int value, Runnable toRun) {
        runnables.add(new Pair<>(value, toRun));
    }
    public void remove(Runnable toRemove) {
        for (Iterator<Pair<Integer, Runnable>> r = runnables.iterator();
            r.hasNext(); ) {
            if (toRemove == r.next().second) {
               r.remove();
               break;
            }
        }
    }
    public void runRandomly() {
        // split list, use code from above
    }
}

EDIT :
Actually, the above is what you get if you get an idea stuck in your head and don't question it properly. Keeping the RandomRunner class interface, this is much easier:

class RandomRunner {
    List<Runnable> runnables = new ArrayList<>();
    public void add(int value, Runnable toRun) {
        // add the methods as often as their weight indicates.
        // this should be fine for smaller numbers;
        // if you get lists with millions of entries, optimize
        for (int i = 0; i < value; i++) {
            runnables.add(toRun);
        }
    }
    public void remove(Runnable r) {
        Iterator<Runnable> myRunnables = runnables.iterator();
        while (myRunnables.hasNext()) {
            if (myRunnables.next() == r) {
                myRunnables.remove();
            }
    }
    public void runRandomly() {
        if (runnables.isEmpty()) return;
        // roll n-sided die
        int runIndex = ThreadLocalRandom.current().nextInt(0, runnables.size());
        runnables.get(runIndex).run();
    }
}
Praveen
  • 1,791
  • 3
  • 20
  • 33
daniu
  • 14,137
  • 4
  • 32
  • 53
  • 7
    30PercentMethod and 70PercentMethod are not valid Java method names – Michael Aug 23 '17 at 10:32
  • 7
    @Michael you are correct. I was just reusing the names OP gave in his question. – daniu Aug 23 '17 at 10:33
  • If it doesn't matter when the sum of the elements adds up to 1 then is there a reason to not actually use a basic `HashMap`? How/will it be different when the sum doesn't add up to 1? I'd assume it always does, doesn't it? It could be worth adding to the answer just to specify it though. – Mibac Aug 23 '17 at 12:50
  • @Mibac no, I don't think it changes behaviour if you use a HashMap, but I only realized this later. I started only proposing using NavigableSet because it most closely resembles the case statement in the question. I'll leave the NavigableMap in though if only because not everybody knows it ;) – daniu Aug 23 '17 at 12:59
  • @Mibac no, wait, it does matter of course. The percentage is determined by the difference between an entry and the previous entry's value, so the order is relevant. Since you have two entries (0,3 - x1) and (1,0 - x2), you need to iterate them in that order or x2 will always get called. So NavigableMap is required for the iteration order. – daniu Aug 23 '17 at 13:23
  • @Mibac the splitting should do this, it's why I keep the double current that adds up the probabilities of each entry. – daniu Aug 23 '17 at 13:43
  • There's an edge case in your code. Even though `byNumber` has a `List` as the map's type it's not going to do much because of this: `targetList.put(current, r);`. `put`ting something to a `Map` overwrites the previous value so it's unnecessary and throwing an exception or noting this in the answer seems relevant. (e.g. when input is: `{3=X, 3=Y}` the output will be `{1=X}` (or `Y`, depends how exactly it'll work)) – Mibac Aug 23 '17 at 13:54
  • @Mibac You're right. It really was meant to be a sketch, but now that you kept asking about it, I finally did compile and run it. I've adjusted the answer. There were more serious bugs like the current/total being the wrong way around, and you need to cast to double to prevent integer promotion, but it seems to work now. – daniu Aug 23 '17 at 14:25
  • 4
    I'm surprised this answer has been so well-received (no offense). If you had more methods to add to the map, it really would not be obvious at all how likely each method would occur - you'd need to subtract every value from the one immediately preceding it. It also doesn't allow you to have two methods weighted the same. – Michael Aug 23 '17 at 16:10
  • 2
    It does allow multiple equally-weighted outcomes, because the keys are *cumulative* probabilities. So if you wanted outcomes A,B,C with probabilities 0.25, 0.25, 0.5 then you would have (0.25,A), (0.5,B), and (1.0,C). – Gareth McCaughan Aug 23 '17 at 18:47
  • 2
    @Michael I'm somewhat surprised myself. I did add a simpler solution to the answer now which should address your concern. – daniu Aug 24 '17 at 06:42
  • @The second solution is really simple and is O(1) however I suspect *space* complexity of this solution is O(N!) or something that isn't polynomial (though N wouldn't be the number of elements, it would be the difference between largest factor and smallest factor or something). For OPs base case this will work fine, however imagine that OP needed 1/3% and others that are multiple of 10. This would work, but now you would need 30 elements. If OP wanted something with a 1% chance of happening and also stuff with 10%, 100,, and if you wanted a 1/3 chance still 300. Still good alternative – Krupip Aug 24 '17 at 13:12
  • @snb yeah like I write in the comment, if there are too many cases, there's need to optimize. You could maintain a list of Pair with the first pair item denoting the i in (i,m1). That would make it O(N) with the number of Runnables, but would make runRandomly() harder to read. But either solution will beat a bunch of case statements in a switch ;) – daniu Aug 24 '17 at 13:28
  • So... if I have a million more chances that an event is happening, you eat my memory like no one? An alias method beats this hand down. – Olivier Grégoire Sep 20 '17 at 12:25
27

All these answers seem quite complicated, so I'll just post the keep-it-simple alternative:

double rnd = Math.random()
if((rnd -= 0.6) < 0)
    60percentmethod();
else if ((rnd -= 0.3) < 0)
    30percentmethod();
else
    10percentmethod();

Doesn't need changing other lines and one can quite easily see what happens, without digging into auxiliary classes. A small downside is that it doesn't enforce that percentages sum to 100%.

jpa
  • 10,351
  • 1
  • 28
  • 45
  • 1
    Why not `if(rnd < 0.6)` ? – user121330 Aug 23 '17 at 21:37
  • 2
    Having `if(rnd < 0.6)` would mean having the next if as `if(rnd < 0.9)`, i.e. keeping track of the sum of percentages from the earlier ifs. Not a problem with only 3 or four options, but imagine if you had 30 and then changed the weight of one of the first, you'd have to change the weights of every subsequent if statement. This way each weight is only tied to it's own if statement, except the else at the end of course – JChristen Aug 24 '17 at 09:49
  • check this code : https://ideone.com/Lsjo8e : op u can get like 10-12 percent for else, 25-32 percent for else if rnd-=0.30 and rest for if rnd -= 0.6. – Dhaval dave Dec 06 '22 at 14:53
  • @Dhavaldave Thatäs normal for random values - if you increase your loop count to e.g. 10000, the variance will decrease. – jpa Dec 06 '22 at 15:03
  • @jpa : I have ran this and other algo ex : https://ideone.com/AoBH84 for 10, 100, 1000, 10000 , it's always 10% deviance, I agree for random value it can be there, but is there any more accurate Algo or code ? – Dhaval dave Dec 07 '22 at 06:24
  • 1
    @Dhavaldave You may wish to read up on [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution). But for me e.g. 10000 produces results such as 10 -> 1021 30 -> 3022 60 -> 5957 which is around 2% differences. If you want exact proportions, fill up an array with the number of items you want and shuffle it. – jpa Dec 07 '22 at 08:36
16

I am not sure if there is a common name to this, but I think I learned this as the wheel of fortune back in university.

It basically just works as you described: It receives a list of values and "frequency numbers" and one is chosen according to the weighted probabilities.

list = (1,a),(1,b),(2,c),(6,d)

total = list.sum()
rnd = random(0, total)
sum = 0
for i from 0 to list.size():
    sum += list[i]
    if sum >= rnd:
        return list[i]
return list.last()

The list can be a function parameter if you want to generalize this.

This also works with floating point numbers and the numbers don't have to be normalized. If you normalize (to sum up to 1 for example), you can skip the list.sum() part.

EDIT:

Due to demand here is an actual compiling java implementation and usage example:

import java.util.ArrayList;
import java.util.Random;

public class RandomWheel<T>
{
  private static final class RandomWheelSection<T>
  {
    public double weight;
    public T value;

    public RandomWheelSection(double weight, T value)
    {
      this.weight = weight;
      this.value = value;
    }
  }

  private ArrayList<RandomWheelSection<T>> sections = new ArrayList<>();
  private double totalWeight = 0;
  private Random random = new Random();

  public void addWheelSection(double weight, T value)
  {
    sections.add(new RandomWheelSection<T>(weight, value));
    totalWeight += weight;
  }

  public T draw()
  {
    double rnd = totalWeight * random.nextDouble();

    double sum = 0;
    for (int i = 0; i < sections.size(); i++)
    {
      sum += sections.get(i).weight;
      if (sum >= rnd)
        return sections.get(i).value;
    }
    return sections.get(sections.size() - 1).value;
  }

  public static void main(String[] args)
  {
    RandomWheel<String> wheel = new RandomWheel<String>();
    wheel.addWheelSection(1, "a");
    wheel.addWheelSection(1, "b");
    wheel.addWheelSection(2, "c");
    wheel.addWheelSection(6, "d");

    for (int i = 0; i < 100; i++)
        System.out.print(wheel.draw());
  }
}
SteakOverflow
  • 1,953
  • 13
  • 26
  • 15
    True, but this is more of a general question. I am sure you know how to implement this in Java... – SteakOverflow Aug 23 '17 at 10:11
  • 1
    Cool, and excellent point with the floats. It would be great to have an option to set some values lower than 1 for ultra low chance branches. Still, this isn't exactly what I'm looking for, the backend is easy enough. I'm more interested in the way how to link it up to the part where you actually make the list in an efficient way. – Moff Kalast Aug 23 '17 at 10:15
  • 2
    I am certain that Java programmers can read pseudo code and convert it into the required SingletonRunnerFactory invocations. – ndim Aug 23 '17 at 17:36
  • 1
    @Michael: then I tell you one: whoever writes the answer might not be absolutely sure of the idiom, yet have a good solution that the OP (or anybody else who comes along) can use. Surely: a programmer who can't understand a piece of code in pseudocode or in a different but very similar language to the one they're currently using is not a programmer. Really. – Gábor Aug 23 '17 at 21:06
  • @HongOoi Care to elaborate what you think is missing? – SteakOverflow Aug 24 '17 at 05:40
  • Anyone fluent enough in Java to propose an edit that translates the example code to Java? – cmaster - reinstate monica Aug 24 '17 at 08:15
  • @SteakOverflow nothing is missing, it's a joke. – Hong Ooi Aug 24 '17 at 09:04
  • @Michael: It might be tagged Java, however, the question is "Coding pattern for random percentage branching?", not "Code for [...]". IMHO, the provided code in this answer is short and concise enough to be converted easily by any Java-, C++-, C#-, Haskell- or even Lua-Programmer. I would prefer if you improve your own answer, instead of critisizing minor and obvious issues in the other answers. Contrary to that, your answer binds way to much too a specific implementation, whereas SteakOverflow's answer is trivially converted to loops, recursion, lists, arrays, integers, methods, etc. – Sebastian Mach Aug 24 '17 at 09:07
  • Cool. Thanks for the edit. I've removed my comments. As is now evident, it's significantly more verbose in Java (isn't everything?), which I personally think is a relevant consideration. It also makes it easier to contrast it with everyone else's answers. – Michael Aug 24 '17 at 09:42
8

While the selected answer works, it is unfortunately asymptotically slow for your use case. Instead of doing this, you could use something called Alias Sampling. Alias sampling (or alias method) is a technique used for selection of elements with a weighted distribution. If the weights of choosing those elements doesn't change you can do selection in O(1) time!. If this isn't the case, you can still get amortized O(1) time if the ratio between the number of selections you make and the changes you make to the alias table (changing the weights) is high. The current selected answer suggests an O(N) algorithm, the next best thing is O(log(N)) given sorted probabilities and binary search, but nothing is going to beat the O(1) time I suggested.

This site provides a good overview of Alias method that is mostly language agnostic. Essentially you create a table where each entry represents the outcome of two probabilities. There is a single threshold for each entry at the table, below the threshold you get one value, above you get another value. You spread larger probabilities across multiple table values in order to create a probability graph with an area of one for all probabilities combined.

Say you have the probabilities A, B, C, and D, which have the values 0.1, 0.1, 0.1 and 0.7 respectively. Alias method would spread the probability of 0.7 to all the others. One index would correspond to each probability, where you would have the 0.1 and 0.15 for ABC, and 0.25 for D's index. With this you normalize each probability so that you end up with 0.4 chance of getting A and 0.6 chance of getting D in A's index (0.1/(0.1 + 0.15) and 0.15/(0.1 + 0.15) respecively) as well as B and C's index, and 100% chance of getting D in D's index (0.25/0.25 is 1).

Given an unbiased uniform PRNG (Math.Random()) for indexing, you get an equal probability of choosing each index, but you also do a coin flip per index which provides the weighted probability. You have a 25% chance of landing on the A or D slot, but within that you only have a 40% chance of picking A, and 60% of D. .40 * .25 = 0.1, our original probability, and if you add up all of D's probabilities strewn through out the other indices, you would get .70 again.

So to do random selection, you need only to generate a random index from 0 to N, then do a coin flip, no matter how many items you add, this is very fast and constant cost. Making an alias table doesn't take that many lines of code either, my python version takes 80 lines including import statements and line breaks, and the version presented in the Pandas article is similarly sized (and it's C++)

For your java implementation one could map between probabilities and array list indices to your functions you must execute, creating an array of functions which are executed as you index to each, alternatively you could use function objects (functors) which have a method that you use to pass parameters in to execute.

ArrayList<(YourFunctionObject)> function_list;
// add functions
AliasSampler aliassampler = new AliasSampler(listOfProbabilities);
// somewhere later with some type T and some parameter values. 
int index = aliassampler.sampleIndex();
T result = function_list[index].apply(parameters);

EDIT:

I've created a version in java of the AliasSampler method, using classes, this uses the sample index method and should be able to be used like above.

import java.util.ArrayList;
import java.util.Collections;
import java.util.Random;

public class AliasSampler {
    private ArrayList<Double> binaryProbabilityArray;
    private ArrayList<Integer> aliasIndexList;
    AliasSampler(ArrayList<Double> probabilities){
        // java 8 needed here
        assert(DoubleStream.of(probabilities).sum() == 1.0);
        int n = probabilities.size();
        // probabilityArray is the list of probabilities, this is the incoming probabilities scaled
        // by the number of probabilities.  This allows us to figure out which probabilities need to be spread 
        // to others since they are too large, ie [0.1 0.1 0.1 0.7] = [0.4 0.4 0.4 2.80]
        ArrayList<Double> probabilityArray;
        for(Double probability : probabilities){
            probabilityArray.add(probability);
        }
        binaryProbabilityArray = new ArrayList<Double>(Collections.nCopies(n, 0.0));
        aliasIndexList = new ArrayList<Integer>(Collections.nCopies(n, 0));
        ArrayList<Integer> lessThanOneIndexList = new ArrayList<Integer>();
        ArrayList<Integer> greaterThanOneIndexList = new ArrayList<Integer>();
        for(int index = 0; index < probabilityArray.size(); index++){
            double probability = probabilityArray.get(index);
            if(probability < 1.0){
                lessThanOneIndexList.add(index);
            }
            else{
                greaterThanOneIndexList.add(index);
            }
        }

        // while we still have indices to check for in each list, we attempt to spread the probability of those larger
        // what this ends up doing in our first example is taking greater than one elements (2.80) and removing 0.6, 
        // and spreading it to different indices, so (((2.80 - 0.6) - 0.6) - 0.6) will equal 1.0, and the rest will
        // be 0.4 + 0.6 = 1.0 as well. 
        while(lessThanOneIndexList.size() != 0 && greaterThanOneIndexList.size() != 0){
            //https://stackoverflow.com/questions/16987727/removing-last-object-of-arraylist-in-java
            // last element removal is equivalent to pop, java does this in constant time
            int lessThanOneIndex = lessThanOneIndexList.remove(lessThanOneIndexList.size() - 1);
            int greaterThanOneIndex = greaterThanOneIndexList.remove(greaterThanOneIndexList.size() - 1);
            double probabilityLessThanOne = probabilityArray.get(lessThanOneIndex);
            binaryProbabilityArray.set(lessThanOneIndex, probabilityLessThanOne);
            aliasIndexList.set(lessThanOneIndex, greaterThanOneIndex);
            probabilityArray.set(greaterThanOneIndex, probabilityArray.get(greaterThanOneIndex) + probabilityLessThanOne - 1);
            if(probabilityArray.get(greaterThanOneIndex) < 1){
                lessThanOneIndexList.add(greaterThanOneIndex);
            }
            else{
                greaterThanOneIndexList.add(greaterThanOneIndex);
            }
        }
        //if there are any probabilities left in either index list, they can't be spread across the other 
        //indicies, so they are set with probability 1.0. They still have the probabilities they should at this step, it works out mathematically.
        while(greaterThanOneIndexList.size() != 0){
            int greaterThanOneIndex = greaterThanOneIndexList.remove(greaterThanOneIndexList.size() - 1);
            binaryProbabilityArray.set(greaterThanOneIndex, 1.0);
        }
        while(lessThanOneIndexList.size() != 0){
            int lessThanOneIndex = lessThanOneIndexList.remove(lessThanOneIndexList.size() - 1);
            binaryProbabilityArray.set(lessThanOneIndex, 1.0);
        }
    }
    public int sampleIndex(){
        int index = new Random().nextInt(binaryProbabilityArray.size());
        double r = Math.random();
        if( r < binaryProbabilityArray.get(index)){
            return index;
        }
        else{
            return aliasIndexList.get(index);
        }
    }

}
Krupip
  • 4,404
  • 2
  • 32
  • 54
  • Interesting answer, but basically a premature optimisation. Always favour simplicity until you *know* something is a performance problem. – Michael Aug 23 '17 at 16:13
  • 3
    @Michael While I generally I agree with that principle, I'd argue in this case that actually implementing the alias table is simple enough that it isn't that big of a deal, and usage is actually much simpler than the marked answer. Additionally the question asked the "defacto coding pattern" for what he was talking about, I'd argue that ***alias method is that coding pattern***. So while the top answer gives a simple solution, it also isn't the standard for such a problem. I'd argue this is analogous to using arrays when linked lists or hash-tables should be used. – Krupip Aug 23 '17 at 16:40
  • 2
    @Michael To avoid sounding defensive I should reiterate that I agree in principle, and to summarize, the largest reason I think this doesn't fall under premature optimization is that in my opinion alias method is the standard coding pattern OP appeared to ask for. – Krupip Aug 23 '17 at 16:44
  • @michael the implementation shown in the article is not the best, it implements some dubious "improvements", and the author is not a well studied programmer (primary experience is Biology). It just happened to be one of the better explanations of the method out there. [Here is an example](https://github.com/Cazadorro/StackGP/blob/master/stackgp/sampling/aliassampling.py) of a simple implementation in python (lines 7 to 42) thats 35 lines of code including new lines, yours takes 37, it would be slightly smaller with removing the {,}s. it isn't hard to understand, it isn't hard to implement. – Krupip Aug 23 '17 at 17:41
  • 2
    @Michael Also my method is guaranteed to be at least as fast as the example you provided even in the very rare best case: Math.random() to index, and compare value, return index, execute based on index. That is it. In any circumstance where you must execute more than one iteration for a search it is always faster. This isn't some theoretical Fibonacci Heap where it *could* get better if given large enough N because of the high constant cost, you can prove it yourself just by understanding how this works. – Krupip Aug 23 '17 at 17:50
  • The line count of mine, excluding braces (if we're going to compare to Python), is 23. The bulk of the "logic" happens in 8 statements. You've yet to provide me with an example of the alias method with a readable code sample - if you need an article to explain it, you're doing it wrong. As I mentioned in my first comment, we haven't established that performance is a problem so the fact that your solution may be faster by a nanosecond or two is so far completely irrelevant. – Michael Aug 23 '17 at 18:12
  • 2
    @Micheal, the need to explain something doesn't invalidate it on any level, Hash tables are difficult to explain, yet I doubt you would cast the same judgment on their usage. Additionally I've explained the algorithm here so you wouldn't even need to look at the article, and the python code I listed is not hard to understand, I simply cannot relate to your confusion on that part. And like I said, 23 vs 35? Not that much smaller. Again, the perfgain isn't insignificant, its several orders of magnitude gain, its obvious because its O(1) with ~= const cost, OP is no doubt loop sampling also. – Krupip Aug 23 '17 at 18:27
  • You're not listening to what I'm saying. "the need to explain something doesn't invalidate it" **"always favour simplicity"** "perfgain isn't insignificant" **"until you *know* something is a performance problem"** I think we're done here because I'll only be repeating myself again. – Michael Aug 23 '17 at 19:10
  • 2
    @Michael I disagree that the notion that you need to explain something stops it from being simple, I also disagree that we don't know this is a performance problem. Also you decided you were "done" before I even replied to you. – Krupip Aug 23 '17 at 19:23
  • @snb check out my updated answer, that should be O(1) as well. I didn't follow your Alias Sampling link, but my guess is it's somewhat like I implemented now. – daniu Aug 24 '17 at 06:50
  • @daniu Note, if you had actually implemented Alias sampling I might have had to report your answer. You aren't supposed to edit other peoples answers into your own. However as it stands, it is not at all like alias sampling, I'll comment on your answer on things I see with the edited solution. – Krupip Aug 24 '17 at 13:01
  • @snd well since your answer only mentions Alias sampling instead of actually implementing it, it's not as if it would have been editing someone else's answer into mine had I implemented it. – daniu Aug 24 '17 at 13:04
  • @daniu No, what you are supposed to do in that situation is edit my answer if you feel you need to add to it, not edit what would have been additions to my answer in yours Or you would post an entirely different answer to the question referencing my answer, but you wouldn't edit that into your answer you had when it had nothing to do with your original post. – Krupip Aug 24 '17 at 13:15
  • @snd a good point, forgot about the possibility to edit others' answers. – daniu Aug 24 '17 at 13:24
6

You could compute the cumulative probability for each class, pick a random number from [0; 1) and see where that number falls.

class WeightedRandomPicker {

    private static Random random = new Random();

    public static int choose(double[] probabilties) {
        double randomVal = random.nextDouble();
        double cumulativeProbability = 0;
        for (int i = 0; i < probabilties.length; ++i) {
            cumulativeProbability += probabilties[i];
            if (randomVal < cumulativeProbability) {
                return i;
            }
        }
        return probabilties.length - 1; // to account for numerical errors
    }

    public static void main (String[] args) {
        double[] probabilties = new double[]{0.1, 0.1, 0.2, 0.6}; // the final value is optional
        for (int i = 0; i < 20; ++i) {
            System.out.printf("%d\n", choose(probabilties));
        }
    }
}
NPE
  • 486,780
  • 108
  • 951
  • 1,012
2

The following is a bit like @daniu answer but makes use of the methods provided by TreeMap:

private final NavigableMap<Double, Runnable> map = new TreeMap<>();
{
    map.put(0.3d, this::branch30Percent);
    map.put(1.0d, this::branch70Percent);
}
private final SecureRandom random = new SecureRandom();

private void branch30Percent() {}

private void branch70Percent() {}

public void runRandomly() {
    final Runnable value = map.tailMap(random.nextDouble(), true).firstEntry().getValue();
    value.run();
}

This way there is no need to iterate the whole map until the matching entry is found, but the capabilities of TreeSet in finding an entry with a key specifically comparing to another key is used. This however will only make a difference if the number of entries in the map is large. However it does save a few lines of code.

SpaceTrucker
  • 13,377
  • 6
  • 60
  • 99
0

I'd do that something like this:

class RandomMethod {
    private final Runnable method;
    private final int probability;

    RandomMethod(Runnable method, int probability){
        this.method = method;
        this.probability = probability;
    }

    public int getProbability() { return probability; }
    public void run()      { method.run(); }
}

class MethodChooser {
    private final List<RandomMethod> methods;
    private final int total;

    MethodChooser(final List<RandomMethod> methods) {
        this.methods = methods;
        this.total = methods.stream().collect(
            Collectors.summingInt(RandomMethod::getProbability)
        );
    }

    public void chooseMethod() {
        final Random random = new Random();
        final int choice = random.nextInt(total);

        int count = 0;
        for (final RandomMethod method : methods)
        {
            count += method.getProbability();
            if (choice < count) {
                method.run();
                return;
            }
        }
    }
}

Sample usage:

MethodChooser chooser = new MethodChooser(Arrays.asList(
    new RandomMethod(Blah::aaa, 1),
    new RandomMethod(Blah::bbb, 3),
    new RandomMethod(Blah::ccc, 1)
));

IntStream.range(0, 100).forEach(
    i -> chooser.chooseMethod()
);

Run it here.

Michael
  • 41,989
  • 11
  • 82
  • 128
  • 1
    It's somewhat interesting to see all your complaints on the other answers, saying "not Java" and the like; everything in an agressive passive tone. And then see your answer here that does not even include human language; at least I see none that explains your code, which algorithms you used, advantages, drawbacks, etc. etc. Instead of wasting further time on critisizing the other answers, which are quite good actually, why not improve your own first? (P.S.: http://www.differencebetween.com/difference-between-probability-and-vs-chance/) – Sebastian Mach Aug 24 '17 at 09:00
  • @SebastianMach Nothing was said in a passive aggressive tone, but I can't help if you interpreted it that way. I tend to keep comments my comments as succinct as I can. Everything I posted was constructive criticism on how I think their answers could be improved. No one is immune to criticism - there's no need to get upset and take it personally. – Michael Aug 24 '17 at 09:25
  • @SebastianMach Now, I think you undermined yourself a little bit with your last link, but yep you *are* right and I've fixed it – Michael Aug 24 '17 at 09:30
  • 2
    Personally, I am always thankful for links like the one that undermined myself. English not being my native language, there was a non-zero probability that "chance" would have been my first choice, too. – Sebastian Mach Aug 24 '17 at 10:43