0

let say I have a total number

tN = 12

and a set of elements

elem = [1,2,3,4]

and a prob for each element to be taken

prob = [0.0, 0.5, 0.75, 0.25]

i need to get a random multiset of these elements, such as

  • the taken elements reflects the prob
  • the sum of each elem is tN

with the example above, here's some possible outcome:

3 3 2 4
2 3 2 3 2
3 4 2 3
2 2 3 3 2
3 2 3 2 2

at the moment, maxtN will be 64, and elements the one above (1,2,3,4).

is this a Knapsack problem? how would you easily resolve it? both "on the fly" or "pre-calculate" approch will be allowed (or at least, depends by the computation time). I'm doing it for a c++ app.

Mission: don't need to have exactly the % in the final seq. Just to give more possibility to an elements to be in the final seq due to its higher prob. In few words: in the example, i prefer get seq with more 3-2 rather than 4, and no 1.

Here's an attempt to select elements with its prob, on 10 takes:

Randomizer randomizer;
int tN = 12;
std::vector<int> elem = {2, 3, 4};
std::vector<float> prob = {0.5f, 0.75f, 0.25f};

float probSum = std::accumulate(begin(prob), end(prob), 0.0f, std::plus<float>());
std::vector<float> probScaled;
for (size_t i = 0; i < prob.size(); i++)
{
    probScaled.push_back((i == 0 ? 0.0f : probScaled[i - 1]) + (prob[i] / probSum));
}

for (size_t r = 0; r < 10; r++)
{
    float rnd = randomizer.getRandomValue();
    int index = 0;
    for (size_t i = 0; i < probScaled.size(); i++)
    {
        if (rnd < probScaled[i])
        {
            index = i;
            break;
        }
    }

    std::cout << elem[index] << std::endl;
}

which gives, for example, this choice:

3       
3       
2       
2       
4
2
2
4
3
3

Now i just need to build a multiset which sum up to tN. Any tips?

markzzz
  • 47,390
  • 120
  • 299
  • 507
  • 2
    Can you elaborate on the probability of each element to be taken? – Marat Jan 13 '22 at 20:21
  • @Marat is the prob to occurs for each elem. i.e. 0.75 on pElem[2] means "higher prob" (i e. 75/100) – markzzz Jan 13 '22 at 20:23
  • 1
    @markzzz probability of what? There are many possible interpretation here, e.g. a) prob. of occurrence in a randomly sampled subset b) an element of a randomly sampled subset is equal to this number, etc. – Marat Jan 13 '22 at 20:48
  • @Marat I would say a). If you note my example, "4" do appair very less compared to 2 and 3 (which they have higher prob) – markzzz Jan 13 '22 at 20:51
  • 1
    @markzzz this conflicts with the definition of multiset as a set of all possible sets, which has this probability fixed. The more I think about this problem, the more ill-defined it looks, and I still can't come up with a probability definition that won't lead to a conflict like this – Marat Jan 13 '22 at 21:03
  • @Marat i need to create a sequence, thats all :) the more a element have higher prob, the more is suitable to be taken – markzzz Jan 13 '22 at 21:10
  • @Marat it doesn't matter. The limit is that the sum of taken elements is tN – markzzz Jan 13 '22 at 21:17
  • 1
    What exactly is "reflects" supposed to mean in "the taken elements reflects the prob"? It is impossible to hit the probabilities exactly (as they total 150%), so there has to be some leeway. How much leeway is too much? Your 5-element examples pick `3` only 40% of the time, despite the goal probability of `3` being 75%. How far can this be pushed before the taken elements no longer "reflect" the probability? Is there a hard limit? Or do you just need a "good effort" approach where the probabilities somehow factor into generating the elements? – JaMiT Jan 14 '22 at 01:04
  • @JaMiT good effort approch :) don't need to have exactly the % in the final seq. Just to give more possibility to an elements to be in the final seq due to its higher prob. In few words: in the example, i prefer get seq with more 3-2 rather than 4, and no 1 – markzzz Jan 14 '22 at 06:30
  • edited the question with further details and some code for take the prob (which seems confusing you :) ) – markzzz Jan 14 '22 at 09:55

0 Answers0