1

Suppose I have a RxC contingency table. This means there are R rows and C columns. I want a matrix, X, of dimension RC × (R + C − 2) that contains the R − 1 “main effects” for the rows and the C − 1 “main effects” for the columns.For example, if you have R=C=2 (R = [0, 1], C = [0, 1]) and main effects only, there are various ways to parameterize the design matrix (X), but below is one way:

1 0
0 1
1 0
0 0

Note that this is 4 x 2 = RC x (R + C - 2), you omit one level of each row and one level of each column.

How can I do this in Python for any value of R and C ie R = 3, C = 4 ([0 1 2] and [0 1 2 3])? I only have the values of R and C, but I can use them to construct arrays using np.arange(R) and np.arange(C).

iwtbid
  • 85
  • 4
  • 9
  • Can you explain how `R = C = 2` gives you that matrix/array? – Divakar Aug 29 '17 at 13:20
  • Might want to start with what a contingency table is, or at least a link. We're programmers (some of us anyway) not systems engineers. – Daniel F Aug 29 '17 at 13:28
  • Great point, made an edit – iwtbid Aug 29 '17 at 13:43
  • What is the expected output for `R=3, C=4`, for example? I don't see how to get "main effects" from that wikipedia link, especially not based on only the number of rows and columns. If you just want an empty array to put values into later, you can create `np.empty((R, C, R + C - 2))` – Daniel F Aug 29 '17 at 14:56
  • how about `np.zeros((R*C, R + C - 2))` ? OP may also want to check out https://stackoverflow.com/questions/29901436/is-there-a-pythonic-way-to-do-a-contingency-table-in-pandas – tony_tiger Aug 29 '17 at 18:07

2 Answers2

1

The following should work:

R = 3
C = 2

ir = np.zeros((R, C))
ir[0, :] = 1
ir = ir.ravel()

mat = []
for i in range(R):
    mat.append(ir)
    ir = np.roll(ir, C)

ic = np.zeros((R, C))
ic[:, 0] = 1
ic = ic.ravel()

for i in range(C):
    mat.append(ic)
    ic = np.roll(ic, R)

mat = np.asarray(mat).T

and the result is:

array([[ 1.,  0.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  1.],
       [ 0.,  1.,  0.,  1.,  0.],
       [ 0.,  1.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  1.,  0.],
       [ 0.,  0.,  1.,  0.,  1.]])

Thanks everyone for your help!

iwtbid
  • 85
  • 4
  • 9
0

Use LabelBinarizer or One-Hot Encoding to create a design matrix

Since all his labels are in similar column, we can use a sklearns preprocessing package which has LabelBinarizer/One Hot Encoding which will convert labels in same column into multiple columns, putting 1s at indexes on which it occured

Example NA
PA
PD
NA

After LabelBinarizer
NA PA PD
1 0 0
0 1 0
0 0 1
1 0 0

psn1997
  • 144
  • 9
  • Can you please elaborate on your answer? For instance, you should provide an example as to how these tools can help solve the problem, or at least links to further documentation. – Richard-Degenne Aug 16 '19 at 16:26