0

I am trying to create a 9 x 9 probability matrix from a contingency / frequency table.

It contains the frequencies for a pair of values (x1,x2) transitioning to a pair of values (y1,y2). x1 and y1 have values of A, B, or C, and x2 and y2 have value of D, E, OR F.

Transitions between all xy pairs do not exist. However, I would like to have these 'missing' transitions to be present as zeros with the table / matrix to make it square (9x9) for use in other analyses.

df <- structure(list(x1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
                    3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
                    y1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
                    2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
                    x2 = structure(c(1L,2L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 1L), 
                    .Label = c("D", "E", "F"), class = "factor"), 
                    y2 = structure(c(1L, 2L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L), 
                    .Label = c("D", "E", "F"), class = "factor"), 
                    x = c("AD", "BE", "CF", "AD", "BD", "CD", "AE", "BE", "CE", "AE", "BF", "CD"), 
                    y = c("AD", "BE", "CF", "AE", "BD", "CD", "AD", "BD", "CD", "AE", "BE", "CF")),
                    .Names = c("x1", "y1", "x2", "y2", "x", "y"), row.names = c(NA, -12L), class = "data.frame")

# df$x <- paste0(df$x1, df$x2) # included in the dput
# df$y <- paste0(df$y1,df$y2)
# convert to factor to include all transitions http://stackoverflow.com/a/13705236/1670053
df$x <- factor(df$x, levels = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE", "CF"))
df$y <- factor(df$y,levels = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE", "CF") )

t1 <- with(df,(table(x,y)))
# t1m <- as.data.frame.matrix(t1)
t2 <- t1/(colSums(t1))
dfm <- as.data.frame.matrix(t2)
#dm <- as.matrix(dfm)

The result DFM, above, without using factor on x and y has the correct values, but of course does include the full set of 9x9 transitions. The desired results DFMd is below.

However, when I include the factored x and y, the result that is produced is not desired, values of NA and Inf are introduced.

Is there a way when using 'missing' factors to evaluate table/colSums(table) and get the desired result?

DFMd <- structure(list(AD = c(0.5, 0.5, 0, 0, 0, 0, 0, 0, 0), AE = c(0.5, 
0.5, 0, 0, 0, 0, 0, 0, 0), AF = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), BD = c(0, 0, 0, 0.5, 0.5, 0, 0, 0, 0), BE = c(0, 0, 
0, 0, 0.5, 0.5, 0, 0, 0), BF = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), CD = c(0, 0, 0, 0, 0, 0, 0.5, 0.5, 0), CE = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), CF = c(0, 0, 0, 0, 0, 0, 0.5, 0, 
0.5)), .Names = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE", 
"CF"), class = "data.frame", row.names = c("AD", "AE", "AF", 
"BD", "BE", "BF", "CD", "CE", "CF"))
nofunsally
  • 2,051
  • 6
  • 35
  • 53

1 Answers1

0

I am still unsure why the code above produces some inf value or the wrong values otherwise, but the code below results in the desired output. It does seem a bit convoluted.

t1 <- with(df,(table(x,y))) # contingency table
tcc <- as.matrix(colSums(t1)) # get col sums
tc <-as.data.frame.matrix(tcc) # store as data.frame to using the rep code below
tct <- t(tc) # transpose to build matrix of colsums
tcx <- tct[rep(seq_len(nrow(tct)), each=9),] # http://stackovernflow.com/a/11121463/1670053 build colsums dataframe to be 9x9

pmat <- t1/tcx # transition matrix
pmat[is.na(pmat)] <- 0 #remove na from 0/0
nofunsally
  • 2,051
  • 6
  • 35
  • 53