12

I am trying to find/figure out a function that can update probabilities.

Suppose there are three players and each of them get a fruit out of a basket: ["apple", "orange", "banana"]

I store the probabilities of each player having each fruit in a matrix (like this table):


apple orange banana
Player 1 0.3333 0.3333 0.3333
Player 2 0.3333 0.3333 0.3333
Player 3 0.3333 0.3333 0.3333

The table can be interpreted as the belief of someone (S) who doesn't know who has what. Each row and column sums to 1.0 because each player has one of the fruits and each fruit is at one of the players.

I want to update these probabilities based on some knowledge that S gains. Example information:

Player 1 did X. We know that Player 1 does X with 80% probability if he has an apple. With 50% if he has an orange. With 10% if he has a banana.

This can be written more concisely as [0.8, 0.5, 0.1] and let us call it reach_probability.


A fairly easy to comprehend example is:

probabilities = [
    [0.5, 0.5, 0.0],
    [0.0, 0.5, 0.5],
    [0.5, 0.0, 0.5],
]

# Player 1's 
reach_probability = [1.0, 0.0, 1.0]

new_probabilities = [
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
]

The above example can be fairly easily thought through.


another example:

probabilities = [
    [0.25, 0.25, 0.50],
    [0.25, 0.50, 0.25],
    [0.50, 0.25, 0.25],
]

# Player 1's 
reach_probability = [1.0, 0.5, 0.5]

new_probabilities = [
    [0.4, 0.2, 0.4],
    [0.2, 0.5, 0.3],
    [0.4, 0.3, 0.3],
]

In my use case using a simulation is not an option. My probabilities matrix is big. Not sure if the only way to calculate this is using an iterative algorithm or if there is a better way.

I looked at bayesian stuff and not sure how to apply it in this case. Updating it row by row then spreading out the difference proportionally to the previous probabilities seems promising but I haven't managed to make it work correctly. Maybe it isn't even possible like that.

Hadus
  • 1,551
  • 11
  • 22
  • This is a really interesting problem. I have worked with evidence of the form "it is not X" in the context of Bayesian belief networks. "It is not X" evidence is represented as a likelihood function which is zero for X and 1 for anything else. The effect is just as you describe, that some elements in the probability matrix get clobbered with zero. Looking at your program, I don't know how to fix it, but you should be able to work out the right calculation if you think about computing the posterior P(not X | it is not X) from P(X, not X) (i.e., what you started with) and the likelihood. – Robert Dodier Mar 12 '21 at 16:56
  • 1
    Keeping that stuff straight isn't easy, although once you get it sorted out the computations are simple. My advice is take a look at Bayesian inference and belief networks. That's going to be vast overkill but then you'll be able to work out the simple calculation that's needed here. – Robert Dodier Mar 12 '21 at 16:57
  • I wrote a simulation so I could check algorithms against it. The way I intuitively calculated it is often wrong. Looking at bayesian belief networks wasn't very helpful yet. I'll keep looking... Simulation is not an option for my use case as I have a much bigger probability matrix. And it has to be accurate. – Hadus Mar 13 '21 at 19:01
  • "My probabilities matrix is big (1326, 7)" Wait, is it not square? I thought I understood the problem, but that would rule out a 1-1 matching. – David Eisenstat Mar 14 '21 at 14:51
  • @DavidEisenstat I understand how that might be misleading so I removed it. That is how I store it because you might imagine that one of the rows is not a player but the rest of the players. If I can work it out for a simple square that is good enough. – Hadus Mar 14 '21 at 14:57
  • 1
    I don't understand the first example. `[1 0 1]` reach proba. So player 1 can have two fruits. However, the first row of the new matrix is `[1 0 0]` – Damien Mar 16 '21 at 17:37
  • 1
    @Damien Imagine we can ask **Player 1** what he would do with each fruit. The first number is with what percentage he would do X when he has an **apple**, second is with **orange**... `[1, 1, 1]` would mean for example that regardless of what fruit **Player 1** has he would do X 100% of the time. – Hadus Mar 16 '21 at 19:30
  • 1
    What I don't understand is the result, I.e. the 1st row of the matrix – Damien Mar 16 '21 at 19:38
  • Which is the name of this problem? – Rafael Valero Mar 16 '21 at 19:55
  • Player 1 did X. We know that Player 1 does X with 80% probability if he has an apple. With 50% if he has an orange. With 10% if he has a banana. This can be written more concisely as [0.8, 0.5, 0.1] and let us call it reach_probability. – Rafael Valero Mar 17 '21 at 21:01
  • However in the examples reach_probability. =[0.8, 0.5, 0.1] – Rafael Valero Mar 17 '21 at 21:02
  • I suspect this something used in Game Theory. – Rafael Valero Mar 17 '21 at 21:02
  • @RafaelValero yes it can be framed in terms of game theory as nodes in a game tree and we got to a new node by the action X. – Hadus Mar 17 '21 at 22:22

4 Answers4

4

Initial condition: p(apple) = p(orange) = p(banana) = 1/3.

Player 1 did X. We know that Player 1 does X with 80% probability if he has an apple. With 50% if he has an orange. With 10% if he has a banana.

p(X | apple) = 0.8 p(x | orange) = 0.5 p(x | banana) = 0.1

Since apple, orange, and banana are all equally likely at 1/3, we have p(x) = 1/3 * 1.4) ~ 0.466666666.

Recall Bayes theorem: p(a | b) = p(b|a) * p(a) / p(b)

So p(apple | x) = p(x | apple) * p(apple) / p(x) = 0.8 * (1/3) / 0.46666666 ~ 57.14%

similarly p(orange | x) = 0.5 * (1/3) / 0.46666666 ~ 35.71%

and p(banana | x) = 0.1 * (1/3) / 0.46666666 ~ 7.14%

Taking your example:

probabilities = [
    [0.25, 0.25, 0.50],
    [0.25, 0.50, 0.25],
    [0.50, 0.25, 0.25],
]

# Player 1's 
reach_probability = [1.0, 0.5, 0.5]

new_probabilities = [
    [0.4, 0.2, 0.4],
    [0.2, 0.5, 0.3],
    [0.4, 0.3, 0.3],
]

p(x) = 0.25 * 1.0 + 0.25 * 0.5 + 0.5 * 0.5 = 0.625
p(a|x) = p(x|a) * p(a) / p(x) = 1.0 * 0.25 / 0.625 = 0.4
p(b|x) = p(x|b) * p(b) / p(x) = 0.5 * 0.25 / 0.625 = 0.2
p(c|x) = p(x|c) * p(c) / p(x) = 0.5 * 0.50 / 0.625 = 0.4

As desired. The other entries of each column can just be scaled to get a column sum of 1.0.

E.g. in column 1 we multiple the other entries by (1-0.4)/(1-0.25). This takes 0.25 -> 0.2 and 0.50 -> 0.40. Similarly for the other columns.

new_probabilities = [
    [0.4, 0.200, 0.4],
    [0.2, 0.533, 0.3],
    [0.4, 0.266, 0.3],
]

If then player 2 does y with the same conditional probabilities we get:

p(y) = 0.2 * 1.0 + 0.533 * 0.5 + 0.3 * 0.5 = 0.6165
p(a|y) = p(y|a) * p(a) / p(y) = 1.0 * 0.2 / 0.6165 = 0.3244
p(b|y) = p(y|b) * p(b) / p(y) = 0.5 * 0.533 / 0.6165 = 0.4323
p(c|y) = p(y|c) * p(c) / p(y) = 0.5 * 0.266 / 0.6165 = 0.2157
Dave
  • 7,460
  • 3
  • 26
  • 39
  • There we go this looks very good. I'll see how practical this is in my solution but this is definitely the answer to the question I asked. Thanks! – Hadus Mar 18 '21 at 19:39
2

Check this document: Endgame Solving in Large Imperfect-Information Games∗

(S. Ganzfried, T. Sandholm, in International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) (2015), pp. 37–45.)

LFP
  • 51
  • 3
1

Here is how I would approach this - have not worked through whether this has problems too but it seems alright in your examples.

Assume each update is of the form "X,Y has probability p'" Mark element X,Y dirty with delta p - p', where p was the old probability. Now, redistribute the delta proportionally to all unmarked elements in the row, then the column, marking each dirty with its own delta, and marking the first clean. Continue until no dirty entry remains.

0.5   0.5   0.0
0.0   0.5   0.5
0.5   0.0   0.5

Belief: 2,1 has probability zero.

0.5   0.0*  0.0    update 2,1 and mark dirty
0.0   0.5   0.5    delta is 0.5
0.5   0.0   0.5

1.0*  0.0'  0.0    distribute 0.5 to row & col
0.0   1.0*  0.5    update as dirty, both deltas -0.5
0.5   0.0   0.5

1.0'  0.0'  0.0    distribute -0.5 to rows & cols
0.0   1.0'  0.0*   update as dirty, both deltas 0.5
0.0*  0.0   0.5

1.0'  0.0'  0.0    distribute 0.5 to row & col
0.0   1.0'  0.0'   update as dirty, delta is -0.5
0.0'  0.0   1.0*

1.0'  0.0'  0.0    distribute on row/col
0.0   1.0'  0.0'   no new dirty elements, complete
0.0'  0.0   1.0'

In your first example:

1/3   1/3   1/3
1/3   1/3   1/3
1/3   1/3   1/3

Belief: 3,1 has probability 0

1/3   1/3   0*     update 3,1 to zero, mark dirty
1/3   1/3   1/3   delta is 1/3
1/3   1/3   1/3

1/2*  1/2*  0'    distribute 1/3 proportionally across row then col
1/3   1/3   1/2*  delta is -1/6
1/3   1/3   1/2*

1/2'  1/2'  0'    distribute -1/6 proportionally across row then col
1/4*  1/4*  1/2'  delta is 1/12
1/4*  1/4*  1/2'

1/2'  1/2'  0'    distribute prportionally to unmarked entries
1/4'  1/4'  1/2' no new dirty entries, terminate
1/4'  1/4'  1/2'

You can mark entries dirty by inserting them with associated deltas into a queue and a hashset. Entries in both the queue and hash set are dirty. Entries in the hashset only are clean. Process the queue until you run out of entries.

I do not show an example where distribution is uneven, but the key is to distribute proportionally. Entries with 0 can never become non-zero except by a new belief.

Patrick87
  • 27,682
  • 3
  • 38
  • 73
  • I think this would indeed be correct if I could start with knowledge in the form of "X,Y has probability p". I don't think I have this. I am going to update my question to be clearer then think through your answer more. It seems promising. Thanks. – Hadus Mar 14 '21 at 13:06
  • I thought at first that the first row can be calculated by just multiplying together the reach probabilities and the first row. Then re-normalizing it. Then we could have info in the form of "X,Y has probability p". But as the harder example shows doing that is incorrect. – Hadus Mar 14 '21 at 13:35
  • answering your comment about decomposing the updates: I don't think there is a way to break up the update. – Hadus Mar 14 '21 at 13:38
  • @Hadus I see your update appears to be multiplying along a row... so in a sense, you have an opportunity to update each of the non-zero entries to any other valid value. Your update can first be scaled so that the result keeps the row with sum 1. Then, we need to find an equivalent sequence of single-change updates which give the same result. This may not be simple. I can work out the case n=3 by hand but it is ugly. I wonder if you couldn't just set the whole row and mark the whole row dirty together. – Patrick87 Mar 14 '21 at 14:19
  • What I am saying is that multiplying along the first row doesn't even give the correct ratio but I am not sure if it becomes correct if we keep following the dirty around. I am going to try your answer with "first be scaled so that the result keeps the row with sum 1." and see what happens :) – Hadus Mar 14 '21 at 14:54
  • I don't think this kind of working out is compatible with a whole row update from what I tried. – Hadus Mar 14 '21 at 15:19
  • @Hadus Probably not. Though, I wonder how your simulation result is compatible with your update. It seems that the first and second entries are about in the same ratio your update suggests, but the third one is off. Maybe the issue is that in the general case you can't just go through once... you might have to keep going around and around until the deltas all fall under some threshold. – Patrick87 Mar 14 '21 at 15:44
1

Unfortunately there’s no known nice solution.

The way that I would apply Bayesian reasoning is to store a likelihood matrix instead of a probability matrix. (Actually I’d store log-likelihoods to prevent underflow, but that’s an implementation detail.) We can start with the matrix

Apple Orange Banana
1 1 1 1
2 1 1 1
3 1 1 1

representing no knowledge. You could use the all-1/3 matrix instead, but I’ve used 1 to emphasize that normalization is not required. To apply an update like Player 1 doing X with conditional probabilities [0.8, 0.5, 0.1], we just multiply the row element-wise:

Apple Orange Banana
1 0.8 0.5 0.1
2 1 1 1
3 1 1 1

If Player 1 does Y independently with the same conditional probabilities, then we get

Apple Orange Banana
1 0.64 0.25 0.01
2 1 1 1
3 1 1 1

Now, the rub is that these likelihoods don’t have a nice relationship to probabilities of specific outcomes. All we know is that the probability of a specific matching is proportional to the product of its matrix entries. As a simple example, with a matrix like

Apple Orange Banana
1 1 0 0
2 0 1 0
3 0 1 1

the entry for Player 3 having Orange is 1, yet this assignment has probability 0 because both possibilities for completing the matching have probability 0.

What we need is the permanent, which sums the likelihood of every matching, and the minor for each matrix entry, which sums the likelihood of every matching that makes the corresponding assignment. Unfortunately we don’t know a good exact algorithm for computing the permanent, and experts are skeptical that one exists (the problem is NP-hard, and actually #P-complete). The known approximation employs sampling via Markov chains.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • I was thinking this might be NP-hard. I actually need this matrix for sampling possible "worlds" after there were some of these updates. Could we sample from likelihoods easily? So far this looks like the best answer Thanks :) – Hadus Mar 14 '21 at 15:51
  • 2
    @Hadus there's a rapidly mixing Markov chain, so if approximate is OK, you're in luck. I'll post more when I get a chance. – David Eisenstat Mar 14 '21 at 15:57
  • Yes approximation is ok if there is no better way. Looking at permanent is pretty high level maths for me so if there are python libraries (or any) that do this that would be amazing. – Hadus Mar 14 '21 at 15:58
  • 1
    @Hadus Hmm, so my memory was off. "Rapidly mixing" is polynomial but the exponent is 7 (!). Meanwhile Ryser's formula will give an exact evaluation but takes time O(n 2^n). I don't think either of these are practical for matrices with dimension >1,000. – David Eisenstat Mar 15 '21 at 00:23
  • That is unfortunate. Thank you for looking into it. If you can update your answer to include that I think it will be good enough to get the bounty unless someone else will find some way to make it work. – Hadus Mar 15 '21 at 01:25