3

My dataframe has two columns with mean values of Team_A and Team_B scoring a goal in a match. For each row, I want to create a 3 by 3 matrix that covers all the possible score line using a Poisson distribution. Here is the first few rows of my data,

d = {'Team_A':[2.0160, 1.3421, 2.4654, 3.0281], 'Team_B':[0.0653, 1.5641, 4.0241, 1.2375]}
df = pd.DataFrame(data=d)

So from the first-row Team A should win the match with the score [2-0] (rounded to nearest integer). Assuming scores are independent and occurs in an interval and using the formula for Poisson distribution,

P(k wins in interval) = ((lambda**k) * exp(-lambda))/factorial(k)

where k = [0,1,2,3]

Team A scores 0, 1, 2 and 3 goals with prob. [0.1332, 0.2685, 0.2707, 0.1819] respectively. 
And, Team B scores 0, 1, 2 and 3 goals with [0.5205, 0.3399, 0.1110, 0.0242] probabilities.

The table below is constructed by element-wise multiplication of the above probabilities.

For example the implied prob. of a 2-0 Team A win = 0.2707 * 0.5205 = 0.140899

                        Team_A Goals                    0       1       2      3
    Team_B Goals     Poisson for no.of_goal/Team    0.1332  0.2685  0.2707  0.1819
      0                     0.5205                  0.0693  0.1398  0.1409  0.0947
      1                     0.3399                  0.0453  0.0913  0.0920  0.0618
      2                     0.1110                  0.0148  0.0298  0.0030  0.0202
      3                     0.0242                  0.0032  0.0065  0.0065  0.0044

Question

I'm lost in how to write a python function that loops through each row and create a 3 by 3 matrix.

A.Z
  • 143
  • 1
  • 8

2 Answers2

1

I'm lost in how to write a python function that loops through each row and create a 3 by 3 matrix.

I get that you've already dealt with the statistic part of the problem (like calculating the probabilites from Poisson distribution), am I right?

If so, you could use itertools product to create your table.
Let's say that prob_a and prob_b are two arrays containing the probabilites for Team A and Team B respectively. The matrix is built in this way:

from itertools import product
import numpy as np

prod_table = np.array([(i*j) for i, j in product(prob_b, prob_a)])
prod_table.shape = (4, 4)

Now you have a 4x4 matrix with all the values you need, which you can convert back to a pandas dataframe.
In this table, Team A probabilites are the column indexes, Team B probabilites are the row indexes (which should match your example). So to get a pandas dataframe you could do:

prob_df = pd.DataFrame(prod_table, index=prob_b, columns=prob_a)

And this is the table you are looking for.

Valentino
  • 7,291
  • 6
  • 18
  • 34
0

I would use numpy for simple linear algebra operations (e.g. multiplying small matrices).

If you already have a data frame in the shape you want you can readily convert it to a numpy.ndarray.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

If not you will want to create an ndarray of zeroes then insert the correct elements in the right places.

rho
  • 33
  • 6